Blog

Semantic Search: Beyond Keyword Matching

Keyword search frustrates users when they describe what they need, not what it's called. Here's how Israeli startups add semantic search — and when not to.

Every search box starts the same way. A user types something, your app queries a database for matching rows, and results appear. That works fine — until users start searching the way they actually think.

“Show me docs about handling late payments” returns nothing because your database stores “overdue invoice management.” A user searches “onboarding steps” and misses your “getting started guide” because the strings don’t match. Your support team fields tickets that should have been resolved in self-service search.

This is where keyword search runs out of road. The question isn’t whether to add semantic search — it’s when the pain is bad enough to justify it, and what approach fits your product’s scale.

Why Keyword Search Fails (and When It Doesn’t)

Keyword search matches text literally. It finds documents containing the words in the query, ranks them by frequency or recency, and returns results. For some use cases — searching by product SKU, filtering by exact tags, looking up a known document name — this is exactly right. Exact matching is a feature, not a bug.

The failure shows up when users don’t know your taxonomy.

The vocabulary mismatch problem

Your HR platform stores policies as “Performance Improvement Plan.” Your users search “PIP” or “managing underperformers.” Classic keyword search returns nothing useful. A user on your e-commerce platform searches “casual summer clothes” but your product catalog says “relaxed fit seasonal apparel.” Different words, identical intent.

This gap grows as your content library grows. More documents means more vocabulary variation, more edge cases, and more frustrated users who blame the product rather than the search.

If users search for known identifiers — order numbers, usernames, exact product names, file names — keyword search is faster, cheaper, and more predictable. Semantic search returns “close matches,” which is the wrong behavior when someone needs the exact document with a specific reference number.

The rule: if your users know what they’re looking for by name, keyword search or fuzzy text matching is sufficient. Semantic search earns its complexity when users describe what they need, not what it’s called.

What Semantic Search Actually Does

Semantic search converts text to vectors — lists of hundreds or thousands of numbers that encode meaning. Similar meaning produces similar vectors. A query for “budget travel” produces a vector close to “cheap flights” and “low-cost transportation,” even with no shared words.

At query time, the user’s search phrase is embedded into the same vector space. You find documents whose vectors are closest, measured by cosine similarity or dot product. The result: documents that match meaning, not just words.

Choosing an embedding model

Two main options. OpenAI’s text-embedding-3-small is cheap, accurate, and requires no infrastructure — just an API call. It handles most SaaS use cases well and is where we start on AI feature work at quickdev.

For applications where sending data to a third-party API is a problem — medical records, legal documents, regulated industries — open-source models like sentence-transformers/all-MiniLM-L6-v2 run locally or on a private server. The quality is slightly lower but the data control is complete.

Both approaches generate fixed-size vectors for any piece of text. The embedding model determines how well the vector space captures actual meaning. For domain-specific content (medical terminology, legal language, technical jargon), domain-tuned models produce better results than general-purpose ones.

Vector storage without the complexity

Your embeddings need a home. The default for most products is to add them alongside existing records.

PostgreSQL + pgvector is where you should start. The pgvector extension adds a vector column type and approximate nearest-neighbor indexing (HNSW or IVFFlat) to a standard Postgres table. If you’re already on Postgres — and most products are — this costs nothing extra and keeps your architecture simple. Supabase, Neon, and Amazon RDS all support pgvector.

Dedicated vector databases (Pinecone, Weaviate, Qdrant) become worth evaluating at hundreds of millions of documents or when you need single-digit millisecond latency at very high query volumes. At startup scale, they add operational overhead without meaningful benefit.

Hybrid Search: What Most Products Actually Need

Pure semantic search has its own failure mode. It finds meaning well but can miss exact matches when they matter. A user searching “error code 4019” doesn’t want semantically similar content about “API errors” — they want the specific page for code 4019.

Hybrid search combines keyword scoring and semantic scoring. Run both searches independently, then combine their scores using Reciprocal Rank Fusion (RRF) or a weighted sum. Documents that score well on both surface first — they’re both exact matches and semantically relevant.

Most SaaS products that add search should go hybrid, not pure semantic. Pure semantic is appropriate when you’re building search over a corpus with zero structured identifiers — a knowledge base, a document library, a support center.

Reranking: the optional third pass

After retrieval comes ranking. Reranking models — Cohere Rerank, cross-encoders — take your top 20–50 results and score each one by feeding both the query and the document to a more accurate (but slower) model. This adds 50–200ms of latency and substantially improves result order.

Worth adding when: result ranking matters a lot, your top-5 results are what users actually use, and you can absorb the latency. Skip it for background processing, bulk document tasks, or any scenario where the user isn’t blocked on a search result.

How to Build This

For a typical product adding semantic search for the first time:

  1. Add pgvector to your Postgres instance
  2. Embed content at write time — when a document is created or updated, generate its embedding and store it in a vector column
  3. Embed queries at search time — same embedding model, one API call per search
  4. Run vector similarity search in Postgres: ORDER BY embedding <=> $query_embedding LIMIT 20
  5. Add full-text search with tsvector/tsquery
  6. Combine results with Reciprocal Rank Fusion
  7. Evaluate with real queries — before shipping, check that the results actually match user intent

An experienced team can ship a working implementation in one to two weeks. Adding reranking and a proper evaluation pass takes another week on top.

For MVP projects where search is a core feature, this sequence is essentially fixed. Where it gets more complex is when the domain is specialized — medical notes, legal contracts, technical manuals — because the embedding model selection and chunking strategy matter more. General-purpose models handle consumer content well. Domain content needs more care.

The evaluation step people skip

Ship when your search beats your existing keyword search on a test set of real queries — not when the demo looks good.

Before launch, collect 50–100 real queries from your support tickets, user interviews, or analytics. Run them against both your old search and your new semantic search. Compare the top-5 results. If semantic search wins on 70%+ of queries and doesn’t embarrassingly fail on the rest, you’re ready.

This is especially true if search is the core value proposition of your product. An AI development project where search is decorative can ship with less validation. One where users judge your product by what they find cannot.


Yaniv Amrami is founder of quickdev. He has helped Israeli SaaS teams design and build semantic search, data pipelines, and AI-powered features across healthcare, HR, fintech, and e-commerce products.

Ready to build something?

quickdev is a full-service software studio based in Tel Aviv. We build MVPs, SaaS platforms, mobile apps, and AI-powered products — fast and without compromise.

Let's Talk