How do time-based and entity-based queries differ from semantic similarity retrieval?
This explores why some questions — "what did we talk about Tuesday?" (time-based) or "how is this company connected to that one?" (entity-based) — can't be answered by finding text that *sounds* similar, and what the corpus says about the machinery each kind of query actually needs.
This explores why two kinds of questions — ones anchored to *when* something happened and ones anchored to *which entities relate to which* — break the usual retrieval trick of fetching passages that read most alike. Semantic similarity search works by turning text into vectors and grabbing whatever sits closest in that space. The recurring finding across the corpus is that closeness-in-meaning is the wrong axis for both temporal and relational questions, and not by a small margin you can tune away — it's a structural mismatch.
The cleanest case is time. A question like "what did we discuss Tuesday?" has almost no semantic content to match against; the answer is defined by a timestamp, not a topic. Why do time-based queries fail in conversational retrieval systems? frames this as a challenge that simply doesn't exist for ordinary semantic search: you need metadata indexing (and, for references like "tell me more about that," a disambiguation step that resolves *what* before you can retrieve at all). Embeddings have no native handle on "last Tuesday."
Entity-based queries fail for a different reason: the answer lives in *connections between things*, not in any single passage. When do graph databases outperform vector embeddings for retrieval? shows graph traversal beating vector similarity on aggregate and multi-hop relational questions, trading higher build cost for precision and completeness — because walking explicit edges (this supplier ships to that plant) is deterministic, while similarity search only gives you a probabilistic cloud of associated text. Can long-context LLMs replace retrieval-augmented generation systems? sharpens the line: stuffing everything into a long context can match RAG on semantic retrieval, yet still can't execute relational queries requiring joins across structured records. Context length doesn't buy you structure.
The deeper why is that embeddings measure the wrong thing in the first place. Do vector embeddings actually measure task relevance? argues they encode co-occurrence — what tends to appear near what — so role-distinct concepts come back as near-twins. Why do queries and their causes seem semantically different? makes the same point from another angle: when a student asks about "projection" after a remark, the semantically closest passage (on projection matrices) is the wrong one — the *cause* of the question is somewhere else entirely. Time and entity queries are just the most visible places where "sounds similar" and "is actually the answer" come apart.
What's worth taking away is that the field is converging on a portfolio view, not a winner: Where do retrieval systems fail and why? catalogs these as distinct architectural failure levels rather than one knob, and Can query-time graph construction replace pre-built knowledge graphs? (LogicRAG) hints at the synthesis — build relational structure *from the query at inference time* so you get graph-style logic without a pre-built graph. The interesting move isn't choosing semantic vs. structured retrieval; it's routing each question to the representation that matches how its answer is actually organized.
Sources 7 notes
Conversational memory faces two distinct retrieval challenges absent from static databases: time-based queries ("what did we discuss Tuesday?") requiring metadata indexing, and ambiguous references ("tell me more about that") requiring contextual disambiguation before retrieval.
Graph-oriented databases solve vector similarity's failure on aggregate queries by replacing probabilistic similarity search with deterministic graph traversal via Cypher. The tradeoff: higher construction cost but precision and completeness for enterprise use cases where query patterns are relational.
The LOFT benchmark shows LCLMs match RAG on semantic retrieval without explicit training, but cannot execute relational queries requiring joins across structured tables. Context length alone cannot bridge this gap.
Embeddings encode co-occurrence patterns, making semantically close but role-distinct concepts highly similar. This works in simple demos but fails in production where underspecified queries have many wrong-but-associated candidates.
Backtracing—finding what caused a query—diverges from semantic similarity especially in conversation and lecture domains. Students ask about projection after hearing a specific statement, but the semantically closest passage discusses projection matrices instead, showing that surface similarity misses the actual cause.
RAG systems fail at three structural levels: adaptive triggering (fixed intervals waste context), semantic-task mismatch (embeddings measure association, not relevance), and mathematical limits (embedding dimension constrains representable document sets). These require fundamentally different retrieval approaches, not tuning.
LogicRAG constructs directed acyclic graphs from queries at inference time rather than pre-building corpus-wide graphs, eliminating construction overhead, avoiding staleness, and enabling query-specific retrieval logic without sacrificing multi-hop reasoning capability.