SYNTHESIS NOTE
Model Architecture and Internals Language, Text, and Discourse

Does cosine similarity actually measure embedding similarity?

Cosine similarity is ubiquitous for comparing learned embeddings, but does it reliably capture semantic closeness? This work investigates whether regularization during training makes cosine scores arbitrary and unstable.

Synthesis note · 2026-06-03 · sourced from Flaws

Cosine similarity is the default tool for quantifying semantic similarity between learned embeddings, on the intuition that direction matters more than norm. This paper shows that intuition is unsafe. Using regularized linear (matrix-factorization) models where closed-form solutions allow analysis, it derives that cosine similarities can be arbitrary and therefore meaningless: for some models they are not even unique, and for others they are implicitly controlled by the regularization applied during training. Since deep models combine multiple regularizations with implicit and unintended effects, taking cosine similarities of their embeddings can render results opaque and possibly arbitrary.

The keeper is a methodological caution with teeth: the same embeddings can produce different "similarities" depending on regularization the practitioner never explicitly chose for similarity, so a cosine score is not a stable, model-independent measure of semantic closeness. The paper outlines alternatives and urges not using cosine blindly.

This sharpens the vault's embedding-geometry caveats. It is the regularization-dependence complement to Why can't cosine space retrievers distinguish word order? (geometry-dependence) and underwrites the production-RAG warning in Do vector embeddings actually measure task relevance?: cosine over learned embeddings is doubly unreliable — wrong target (association) and unstable measure (regularization-controlled).

Inquiring lines that use this note as a source 3

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
12 direct connections · 85 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

cosine similarity of learned embeddings can be arbitrary and meaningless because it is implicitly controlled by regularization