What role does prediction error play in human event segmentation?
This explores Event Segmentation Theory's core claim — that the mind cuts continuous experience into discrete events at moments where its ongoing prediction breaks down — and asks what the corpus has on that prediction-then-error mechanism.
This reads the question through the lens of how prediction failure draws event boundaries: the brain runs a running forecast of what comes next, and when that forecast suddenly fails, you perceive an event boundary. The corpus doesn't contain a paper that tests this human-cognition claim head-on, so the honest answer is that direct evidence is thin here — but the collection circles the idea from a striking angle worth knowing about. The most direct doorway is the finding that GPT-3 segments narrative into events more like the *average* of human annotators than individual humans do Do language models segment events like human consensus does?. The intriguing part is *why*: the note suggests next-token prediction may itself parallel human event cognition. If a model trained purely to predict the next word ends up carving narratives at the same seams humans do, that is indirect support for the idea that prediction — and the surprise when prediction fails — is what makes a boundary feel like a boundary.
The machine-learning side of the corpus makes the prediction-to-structure link more concrete. UI-JEPA learns task-aware *temporal* representations from unlabeled screen recordings by predictively masking parts of the video and forcing the model to fill them in Can unlabeled UI video teach models what users intend?. That is essentially a prediction-error engine applied to continuous activity streams: the structure the model recovers — where one sub-task ends and another begins — emerges from how hard the next moment is to predict. It is the engineering echo of the cognitive claim, even though no human brains are involved.
There's a sharper, almost contrarian thread too. One note argues that AI output is "event-residue" — text carrying the surface markers of communication but lacking the underlying event structure that produces a genuine utterance, with humans supplying the missing orientation through interpretive labor Does AI generate genuine utterances or just text patterns?. Set against the segmentation finding, this raises a real tension the corpus leaves open: a model can reproduce *where humans draw event boundaries* without itself possessing the event structure that, in humans, gives those boundaries their meaning. Prediction error may be sufficient to *locate* a seam but not to *constitute* an event.
So the thing you didn't know you wanted to know: the corpus suggests prediction error might be a shared substrate — the same statistical mechanism that lets a language model match human causal biases Do large language models make the same causal reasoning mistakes as humans? may be what lets it match human event boundaries — while simultaneously hosting a counterargument that matching the boundaries isn't the same as having the events. If you want the cleanest entry point, start with the GPT-3 segmentation note and read it against the event-residue note; the disagreement between them is more illuminating than either alone.
Sources 4 notes
GPT-3's event boundaries correlate more strongly with averaged human annotations than individual human annotators do. This suggests language models may pre-compute statistical consensus through training on diverse text, or that next-token prediction parallels human event cognition.
UI-JEPA applies JEPA-style predictive masking to screen recordings, learning task-aware temporal representations that an LLM decoder can use to infer intent with minimal paired data. This trades the bottleneck of labeled video for abundant unlabeled streams.
AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.
LLMs show weak explaining away and Markov violations in collider networks, matching human error patterns exactly. This suggests shared mechanisms rooted in training data statistics rather than categorical reasoning inferiority.