INQUIRING LINE

Why is a combinatorial framework better than family resemblance classification?

This reads the question as: why might classifying things by combining a set of discrete, explicit features (a combinatorial framework) beat grouping them by fuzzy overlapping similarity (family resemblance, where members share no single defining trait) — and the corpus speaks to this obliquely, through a recurring finding that explicit structure beats similarity-matching.


This explores the contrast between classifying by combining discrete, named features versus grouping by family resemblance — overlapping similarity where no single trait defines membership. The collection doesn't tackle that exact framing head-on, but several notes circle the same underlying tension, and they lean consistently one way: systems that compose explicit parts tend to generalize where systems that lean on raw similarity quietly fail.

The sharpest version is the dot-product-vs-MLP result. A multilayer perceptron is a universal approximator — in theory it can learn any similarity function, the way a family-resemblance scheme can in principle absorb any cluster of overlapping cases. Yet in practice a plain dot product, which imposes a fixed geometric structure, substantially outperforms it Why does dot product beat MLP-based similarity in practice? Can MLPs learn to match dot product similarity in practice?. The lesson is that flexible-but-unstructured beats nothing here — structured inductive bias wins, and the expressive method needs enormous data just to claw back to the structured baseline. Open-ended resemblance-matching is the costly option, not the cheap one.

The same shape recurs when the task is genuinely compositional. Transformers asked to reason compositionally don't learn systematic rules; they collapse the problem into matching memorized subgraphs — essentially family resemblance to training examples — and shatter on novel combinations, with errors compounding step by step Do transformers actually learn systematic compositional reasoning?. The flip side appears in syntax: when models do encode relations through a structured geometry — a polar coordinate system capturing both type and direction — accuracy nearly doubles over distance-only (pure-similarity) probing How do language models encode syntactic relations geometrically?. Structure you can combine and read off beats a soup of resemblances.

You can see the cost of resemblance-only thinking elsewhere too. Graph databases that traverse explicit relations deterministically outperform vector-similarity search on relational, multi-hop queries, trading higher setup cost for precision and completeness When do graph databases outperform vector embeddings for retrieval?. And a model can carry every linearly decodable feature a task needs while its internal organization is fractured — accuracy looks fine until perturbation or distribution shift exposes that the categories were never cleanly composed Can models be smart without organized internal structure?.

What you didn't come for but might take away: the win isn't that combinatorial frameworks are more powerful — family resemblance is, formally, more flexible. It's that flexibility without structure has to be paid for in data, brittleness, and silent failure on the cases that never showed up in training. Even where structure does emerge, it's coarse-to-fine and orderly — taxonomy splitting along clean branches rather than fuzzy clusters Do embedding eigenvectors organize taxonomy from coarse to fine? — which is exactly what makes a combinatorial scheme generalize to combinations it has never seen.


Sources 7 notes

Why does dot product beat MLP-based similarity in practice?

Rendle et al. show properly-tuned dot products substantially beat MLP-based similarity despite MLP universality. Learning a dot product with an MLP requires large models and datasets; dot products also enable efficient retrieval at production scale through MIPS algorithms.

Can MLPs learn to match dot product similarity in practice?

Rendle et al. show that carefully tuned dot products substantially outperform learned MLP similarities in collaborative filtering. MLPs require excessive capacity and data to match simple geometric similarity, and they cannot be efficiently retrieved at scale—proving inductive bias matters more than expressiveness.

Do transformers actually learn systematic compositional reasoning?

Research shows transformers succeed on in-distribution tasks by memorizing computation subgraphs from training data, not by learning systematic rules. They fail drastically on novel compositions, with errors compounding across reasoning steps.

How do language models encode syntactic relations geometrically?

The Polar Probe shows LLMs represent syntactic type and direction through both distance and angular position between embeddings, nearly doubling accuracy over distance-only methods. This demonstrates neural networks spontaneously learn structured, symbolic-compatible geometry.

When do graph databases outperform vector embeddings for retrieval?

Graph-oriented databases solve vector similarity's failure on aggregate queries by replacing probabilistic similarity search with deterministic graph traversal via Cypher. The tradeoff: higher construction cost but precision and completeness for enterprise use cases where query patterns are relational.

Can models be smart without organized internal structure?

Models trained with SGD can contain all the linearly decodable features needed for a task while maintaining fundamentally broken internal organization. This makes them vulnerable to perturbation and distribution shift invisible to standard evaluation metrics.

Do embedding eigenvectors organize taxonomy from coarse to fine?

Leading eigenvectors of embedding Gram matrices separate broad taxonomic branches first, then progressively finer sub-branches—a coarse-to-fine spectral order that tracks the WordNet hypernym tree level by level, confirming predictions from co-occurrence statistics.

Next inquiring lines