Can representation sparsity order few-shot demonstrations effectively?
Does measuring how sparse a model's hidden states are for each example provide a reliable signal for ordering few-shot demonstrations in prompts? This matters because curriculum ordering significantly affects in-context learning performance.
Once representational sparsity tracks task difficulty for a given model, sparsity itself becomes a usable signal for curriculum design. Farther the Shift, Sparser the Representation operationalizes this with Sparsity-Guided Curriculum In-Context Learning (SG-ICL), which uses the sparsity of last-layer activations to schedule few-shot demonstrations in the prompt.
The mechanism: measure how sparse the model's last hidden states are when processing each candidate few-shot example. Order them so the demonstrations escalate from sparse (high difficulty for this model) to dense (low difficulty), or vice versa depending on what the curriculum is meant to achieve. The result is considerable performance enhancements over random or naive ordering.
This is a model-internal curriculum signal. Most curriculum learning approaches require external difficulty labels — annotator effort, heuristics about problem features, or proxy measures like solution length. Sparsity sidesteps this entirely. The model itself reveals which examples are hard for it through how its representations respond. The curriculum can be tailored to the specific model being used rather than to some external notion of universal difficulty.
The technique generalizes across the in-context learning landscape. Anywhere few-shot prompting is used — classification, reasoning, agentic deployments — sparsity-derived ordering is available. It costs nothing extra at the relevant scale: hidden states are computed regardless, and reading their sparsity is a free byproduct. The only requirement is access to the activations, which is available for any white-box deployment.
For builders of LLM pipelines, this argues for instrumentation that exposes activation-sparsity statistics. The signal supports curriculum ordering, hard-example mining, confidence calibration, and likely other applications not yet identified. Sparsity is becoming a richer interpretability primitive than the static-property framing has suggested.
The deeper template is that adaptive internal phenomena — sparsity here, attention concentration elsewhere, gradient magnitudes during training — can be operationalized as signals for system behavior once they are recognized as informative rather than incidental.
Inquiring lines that use this note as a source 24
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
- Can few-shot examples narrow generative diversity in creative tasks?
- Can curated demonstrations compensate for smaller or simpler training environments?
- Can demo placement be tuned as a task-specific hyperparameter?
- How do ordering effects compound across different prompt component scales?
- Why does curriculum learning with tight budgets beat fixed-budget approaches?
- Why does entropy-based frame sampling work better than uniform stride selection?
- Can backward transfer measurements reliably predict optimal multi-task training order?
- How would weight sparsity change what representation analysis methods can detect?
- Why does training order matter across different domain types?
- Can activation sparsity patterns guide the selection of in-context learning demonstrations?
- Can memory primitives become first-class design objects like computation sparsity?
- How does consolidation schedule order affect final memory quality?
- Why does the order of training examples matter for what models learn?
- How does sparsity tolerance vary across different task types?
- Does sparsity enforce compositional structure or merely amplify existing modularity?
- Can sparsity patterns reliably indicate how well a model knows its input?
- How does representation sparsity change when inputs fall outside the training distribution?
- Why does curriculum order matter when information theory says data order is irrelevant?
- How should benchmark design account for task-dependent sparsity tolerance differences?
- Does sparsity-guided ordering work equally well for reasoning and classification tasks?
- Can spectral eigenvector ordering serve as a model-agnostic interpretability probe?
- Do few-shot examples improve in-context learning or add noise?
- Why does exemplar performance vary across order complexity diversity and style?
- Can training order and structure shape what networks retain and learn?
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Do language models sparsify their activations under difficult tasks?
When LLMs encounter unfamiliar or difficult inputs, do their internal representations become sparser rather than denser? Understanding this adaptive response could reveal how models stabilize reasoning under uncertainty.
same paper, the underlying phenomenon this method exploits
-
Is representational sparsity learned or intrinsic to neural networks?
Explores whether sparsity in neural network activations is engineered through training or emerges as a default response to unfamiliar inputs. Understanding this distinction could reshape how we design and interpret model behavior.
same paper, the developmental story
-
Why do trajectories matter more than individual examples for in-context learning?
Can language models learn new sequential decision-making tasks from context alone, and if so, what data properties make this possible? This explores why isolated state-action pairs fail where full trajectories succeed.
adjacent: another structural requirement for effective ICL
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs
- Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning
- Context Tuning for Retrieval Augmented Generation
- In-Context Principle Learning from Mistakes
- Schema-learning and rebinding as mechanisms of in-context learning and emergence
- Mechanistically Interpreting the Role of Sample Difficulty in RLVR for LLMs
- Generalization to New Sequential Decision Making Tasks with In-Context Learning
- Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Original note title
sparsity-guided curriculum in-context learning uses representation sparsity as a scheduling signal for few-shot demonstrations