Can minimal privacy boundaries generalize beyond phone-use contexts?
This explores whether the simple two-category privacy boundary built for phone agents — the iMy "minimal privacy contract" — could work as a general rule for AI privacy, or whether privacy leaks in other settings escape such a clean binary.
This explores whether the simple two-category privacy boundary built for phone agents — the iMy "minimal privacy contract" — could work as a general rule for AI privacy, or whether privacy leaks in other settings escape such a clean binary. The starting point is a genuinely appealing idea: iMy splits user data into LOW (use freely by default) and HIGH (use only with explicit approval), and the win is that this binary is simple enough for an agent to actually follow yet precise enough to audit deterministically Can a two-category privacy boundary actually be auditable?. The reason it works on phones is that privacy compliance there turns out to be a *distinct capability* — a model can be good at finishing tasks and still fail at privacy, so you need a boundary you can check independently rather than assuming success implies safety Do phone agents succeed at all three critical tasks equally?. The interesting question is whether the cleanliness survives once you leave the phone.
The corpus suggests the boundary travels well when leakage is *behavioral* — about what an agent chooses to access or surface — and poorly when leakage is *inferential or emergent*, because then there's no discrete moment of "use" to gate. The hardest counterexample is reasoning: roughly three-quarters of privacy leaks in model reasoning traces happen because the model materializes sensitive data as part of thinking, and longer chains leak more while post-hoc anonymizing degrades utility — the private data is functioning as cognitive scaffolding, not as a record being retrieved Do reasoning traces actually expose private user data?. A LOW/HIGH tag can't catch that, because the model never "decides" to use the HIGH item; it reconstructs it mid-thought. Inference breaks the binary from the other direction too: web-browsing models predict gender, age, and political orientation from a username and a sparse profile alone Can LLMs predict demographics from social media usernames alone?. None of those raw signals would be tagged HIGH, yet the *derived* fact is exactly what a privacy contract would want to protect.
There's also a relational dimension the phone framing doesn't see. Personalization research shows privacy risk isn't a fixed property of a data item but something that escalates over time — each interaction raises the baseline of trust, anthropomorphism, *and* exposure together, so a category that was safely LOW early becomes sensitive as the relationship deepens Does chatbot personalization build trust or expose privacy risks?. And the asymmetry that makes private information hard at all is precisely what models skip: LLMs look socially competent only when one model secretly controls everyone, and fail systematically once agents genuinely hold information others don't have Why do LLMs fail when simulating agents with private information?. A static two-bucket contract assumes the boundary is known up front, but real privacy lives in who-knows-what-when.
The deeper tension is that the same judgment-free quality that makes these systems leaky also makes them attractive disclosure partners. People who want to deceive self-select toward machines because there's no one to lie to Do dishonest people prefer talking to machines?, and people disclose intimate secrets more freely precisely because the absence of social judgment removes the usual brakes Do chatbots help people disclose more intimate secrets?. So users feed these systems *more* HIGH-category data than they would a human — which means a minimal contract isn't just an audit convenience, it's load-bearing exactly where the inflow is largest and least guarded.
The honest read: iMy generalizes as a *governance interface* — a binary that makes "did the agent comply?" answerable is valuable anywhere agents act on user data. What doesn't generalize is the assumption that all privacy reduces to gating discrete acts of access. Phone agents leak by *doing*; reasoning models, inference engines, and long-term companions leak by *thinking, deriving, and accumulating*. A minimal boundary is a strong floor and a poor ceiling — useful everywhere as the auditable layer, insufficient anywhere the leak never passes through a checkable decision point.
Sources 8 notes
The iMy contract splits data into LOW (default-use) and HIGH (explicit-approval-required) categories, producing concrete, observable compliance checks. This binary is simple enough for agents to follow reliably while remaining precise enough for deterministic evaluation.
MyPhoneBench demonstrates that task success, privacy-compliant completion, and saved-preference reuse are statistically distinct capabilities with no model dominating all three. Success-only rankings do not predict privacy or preference performance.
74.8% of privacy leaks in language model reasoning traces result from models materializing sensitive user data during thought processes. Longer reasoning chains amplify leakage, and anonymizing traces post-hoc degrades model utility, suggesting private data functions as cognitive scaffolding.
Evaluated on 1,384 survey participants and 48 synthetic accounts, web-browsing LLMs successfully predicted gender, age, and political orientation from X usernames and profiles alone. The models showed systematic gender and political biases specifically against low-activity accounts, relying on stereotype-driven defaults when content was sparse.
Longitudinal research shows personalization enhances trust and anthropomorphism but also amplifies privacy concerns and escalating user expectations. One-shot studies miss these temporal dynamics—each interaction raises the baseline, making failures more disappointing.
Research shows LLMs perform well when one model controls all interlocutors but fail systematically when agents possess private information. This reveals that apparent social competence relies on grounding work that models skip in omniscient settings.
Experimental evidence shows people likely to cheat significantly prefer reporting to online forms rather than humans, because machines function as judgment-free zones where deception carries less psychological burden.
The absence of social judgment in chatbot interactions removes barriers to self-disclosure that normally constrain conversation with humans. The therapeutic benefit derives from the user's own cognitive processing during disclosure, not from the chatbot's understanding.