Can input-only training encode user preferences without task-specific labels?

This explores whether a model can pick up what a user wants by learning from raw inputs alone — observed behavior, unlabeled streams — rather than from explicitly labeled preference or task data, and what that buys you versus loses.

This explores whether a model can pick up user preferences from raw inputs alone, without someone hand-labeling "this is what the user wants." The corpus has a clear poster child for yes: UI-JEPA Can unlabeled UI video teach models what users intend? applies JEPA-style predictive masking to plain screen recordings — the model learns to predict masked chunks of UI activity, and that self-supervised objective alone produces representations rich enough for a decoder to read off user intent with only minimal labeled examples. The trick is that the supervision comes from the structure of the input itself (what comes next on screen), not from task annotations. The trade it names is the interesting part: you swap the bottleneck of scarce labeled video for abundant unlabeled streams.

A second route reaches the same place by watching instead of predicting. M3-Agent Can agents learn preferences by watching rather than asking? infers and acts on preferences from continuous multimodal observation — no one asks the user anything, no preference dataset is collected. The preference signal is reconstructed from accumulated observation, organized into an entity-centric memory graph. So "input-only" splits into two flavors the corpus distinguishes: learning from the predictive structure of inputs (UI-JEPA) versus learning from the accumulated record of them (M3-Agent).

What's worth knowing is how sharply this contrasts with the label-hungry methods sitting right next to it. PReF Can user preferences be learned from just ten questions? still needs explicit preference comparisons — it just makes them cheap, inferring a personalized reward from ten adaptive questions. PLUS Can text summaries beat embeddings for personalized reward models? trains on preference data to produce text summaries. These work, but they assume someone provides preference labels somewhere in the loop. The input-only methods are betting they can skip that entirely, and the early evidence says you can get surprisingly far — but probably not all the way to fine-grained reward alignment without some labeled signal to anchor it.

There's also a quieter finding about what *form* the encoded preference should take. PRIME Does abstract preference knowledge outperform specific interaction recall? shows that abstracted preference knowledge (summaries, parametric encodings) consistently beats just retrieving raw past interactions. This matters for input-only training: simply hoarding inputs isn't enough — the win comes from compressing them into semantic abstractions, which is exactly what UI-JEPA's learned representations and M3-Agent's semantic graph are doing. Raw episodic input is the material; abstraction is the value.

One caution the corpus raises, almost as a warning label. The same property that makes input-only learning powerful — that statistical regularities in inputs carry signal beyond their literal content — is the property behind subliminal trait transmission Can language models transmit hidden behavioral traits through unrelated data?, where behavioral traits leak through data with no semantic relationship to the trait at all. If preferences can be encoded from input statistics without labels, so can things you didn't intend to encode. Input-only training doesn't get to choose which patterns it absorbs, which is the flip side of not needing labels to absorb them.

Sources 6 notes

Can unlabeled UI video teach models what users intend?

UI-JEPA applies JEPA-style predictive masking to screen recordings, learning task-aware temporal representations that an LLM decoder can use to infer intent with minimal paired data. This trades the bottleneck of labeled video for abundant unlabeled streams.

Can agents learn preferences by watching rather than asking?

M3-Agent demonstrates that separating episodic events from semantic knowledge in an entity-centric graph, combined with parallel memorization and control processes, allows agents to infer and act on user preferences without asking. This architecture mirrors human cognitive systems that bind disparate information about individuals across sensory modalities.

Can user preferences be learned from just ten questions?

PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.

Can text summaries beat embeddings for personalized reward models?

PLUS trains summarizers and reward models jointly, learning that text-based preference summaries capture dimensions zero-shot summaries miss. These summaries transfer to GPT-4 for zero-shot personalization and remain interpretable to users.

Does abstract preference knowledge outperform specific interaction recall?

PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.

Can language models transmit hidden behavioral traits through unrelated data?

Research demonstrates that behavioral traits propagate between models via filtered data bearing no semantic relationship to the trait. The effect is model-specific, fails across different architectures, and persists despite rigorous filtering—indicating the mechanism embeds statistical signatures rather than semantic content.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst. The question: Can input-only training encode user preferences without task-specific labels? A curated library (spanning 2020–2026) found evidence YES, but with sharp constraints.

What a curated library found — and when (dated claims, not current truth):
• UI-JEPA (2024) learns user intent from masked UI video alone, no paired labels — representations rich enough for minimal-example decoding of intent.
• M3-Agent (2025) reconstructs preferences from continuous multimodal observation via entity-centric memory graphs, requiring zero explicit preference elicitation.
• Abstracted preference knowledge (summaries, parametric encodings) beats raw episodic retrieval by measurable margins (PRIME, 2025).
• Input-only methods avoid label bottleneck but likely cannot reach fine-grained reward alignment without *some* labeled signal to anchor learned representations (implicit across UI-JEPA, M3-Agent findings).
• Behavioral traits and unintended preferences leak through input statistics regardless of semantic content (subliminal transmission, 2025) — input-only training absorbs what it does not intend.

Anchor papers (verify; mind their dates):
• arXiv:2409.04081 — UI-JEPA (2024): predictive masking on UI activity.
• arXiv:2507.04607 — PRIME (2025): semantic memory vs. episodic retrieval.
• arXiv:2507.14805 — Subliminal Learning (2025): trait transmission via hidden signals.
• arXiv:2604.03238 — Measuring Human Preferences in RLHF (2026): social science framing.

Your task:
(1) RE-TEST EACH CONSTRAINT. For UI-JEPA, M3-Agent, and the abstraction finding: have newer models (scaling, multimodal fusion, in-context learning, or deployed observational systems) since relaxed the requirement for *any* labeled signal, or have they instead revealed where that requirement still binds? Distinguish durable question (preference encoding from structure) from perishable limitation (need for label anchoring). Cite what resolved or tightened each constraint.
(2) Surface the strongest contradicting or superseding work from the last ~6 months — especially anything claiming input-only training suffices *without* abstraction, or papers showing the leak/misalignment risk from subliminal transmission is worse than the corpus indicates.
(3) Propose 2 research questions that assume the regime may have moved: e.g., "Under what model scale does representation learning from input structure alone saturate preference fidelity?" or "Can synthetic preference labels seeded from input-only representations outcompete human-labeled preference datasets?"

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Can input-only training encode user preferences without task-specific labels?

Sources 6 notes

Next inquiring lines