Agents collapse "observed", "concluded", and "generated" into one confidence level. Is anyone modeling episte…

Been chewing on this for a while and I honestly can't tell if it's genuinely open or if I'm just not googling the right terms, so I'll throw it out here. There's a paper by Roynard that put words to something I'd been vaguely bothered by for a while. He calls it a category error, the idea being that pretty much every memory system applies the same persistence model to everything it stores, usually some kind of decay, no matter what the thing actually is. Which is fine for some throwaway message, but it gets weird when the same treatment gets applied to something you've actually validated and would want to stick around and get corrected if it turns out wrong. So you end up storing very different kinds of information under the same rules, even though how much you should trust them or how long they should live is completely different, and I don't think many people are treating that difference as a real property of the memory. That kind of lines up with the thing that keeps biting me once an agent's been running a while, which is that the system doesn't really have any idea how it came to know something in the first place. Something it actually observed, something it inferred a few steps back, and something it just made up all get written down more or less the same way and come back out later at the same confidence, and because none of it is marked as one kind or another, the agent doesn't really have a way to notice when it's wrong. So something stale or hallucinated can resurface later and it looks basically the same as something solid. Timestamps help a little, but they don't tell you what actually backed a belief, and they don't do much when whatever supported it turns out to be wrong and you'd want the agent to walk that back. What bugs me is that this feels like it should already map onto stuff people worked out decades ago. Belief revision (AGM and everything after it) is basically about updating a set of beliefs without it going inconsistent, but I've never really seen it bolted onto an agent's memory at any serious scale. Truth maintenance systems (JTMS/ATMS) are almost exactly this, since the whole point is tracking justifications and retracting conclusions when their premises disappear, and yet they seem basically absent from the agent stacks I've looked at. Calibration work is more about confidence on outputs than about the support behind things you've already stored as they age. And the smarter retrieval stuff, like HippoRAG or the temporal knowledge graph systems (Zep, Graphiti), makes retrieval better but still kind of treats memory as things to find rather than claims that have a support status attached. The closest thing I've found to what I'm describing is Hindsight, which actually splits memory into separate networks and has real conflict-resolution policies, but even there contradictions get surfaced when you query rather than reconciled up front, and corrections don't really propagate to the beliefs that depended on them. A recent paper gets at the human side of it pretty well, The Missing Knowledge Layer in AI. Their framing is that language sort of collapses uncertainty, so when a model states a guess, an inference, and an actual recollection all at the same surface confidence, the person on the other end can't really tell which one they're reacting to, which felt like the same problem I'm describing just viewed from the user's side instead of from inside the memory. What I actually want, at the end of all this, is a small set of primitives for it. Some reasonably clean way to assert something, say what supports it, note that two things contradict, supersede one with another, retract it when it falls apart, with the epistemic status being an actual property of the thing instead of something I bolt on after the fact. Belief revision and TMS feel like the right ancestors, but neither was really built for noisy, high-volume, LLM-generated memory, so a lot of my time just goes into trying to figure out what the right primitives even are. So for anyone who's actually shipped long-horizon agents, I'm curious whether anyone types their memory by epistemic status (observed, inferred, asserted, retracted) instead of just storing flat text plus embeddings, whether anyone's gotten classical belief revision or a TMS to hold up as the actual live memory layer rather than falling over on scale and noise, and whether handling contradictions at write time is even worth the latency or if most people just eat the inconsistency and sort it out at read time. And honestly, if this is already a solved thing and I'm just reinventing some old formalism with extra steps, I'd genuinely appreciate someone pointing me at it so I can go read. But if it's actually underexplored then I'm a little surprised, because it really looks like one of the things that quietly wrecks agents once they've been running longer than a week. submitted by /u/Bright-Fun-1638 [link] [Kommentare]

Agents collapse "observed", "concluded", and "generated" into one confidence level. Is anyone modeling epistemic status directly instead of just improving retrieval? [D](reddit.com)

Comments