InFeeo
Global
All
New
Language

Channels

The Verifier Tax: Horizon-Dependent Safety–Success Tradeoffs in Tool-Using LLM Agents [R](reddit.com)
We recently presented a paper at ACM CAIS 2026 on safety evaluation for tool-using LLM agents. The core issue is that task completion alone can be misleading: an agent may complete a task while violating a safety or policy constraint. We separate outcomes into safe success, unsafe success, and failure, and study how verification changes this tradeoff. We evaluate this using τ-bench / Tau-bench tool-use scenarios and propose a two-tier verification architecture: deterministic policy/tool checks first, followed by an LLM-based verifier for more contextual safety cases. The main finding is that verification can reduce unsafe success, but it can also reduce task completion as the task horizon increases. This creates what we call the Verifier Tax: a horizon-dependent safety–success tradeoff in tool-using agents. Paper: https://dl.acm.org/doi/full/10.1145/3786335.3813160 Curious how others think agent evaluations should report unsafe success. Should unsafe completion be counted as success, failure, or a separate category? submitted by /u/AccomplishedLeg1508 [link] [Kommentare]
GIER: A Danish computer from 1961 with a role in the modern astronomy(doi.org)
A Danish computer, GIER, from 1961 played a vital role in the development of a new method for astrometric measurement. This method, photon counting astrometry, ultimately led to two satellites with a significant role in the modern revolution of astronomy. A GIER was installed at the Hamburg Observatory in 1964 where it was used to implement the entirely new method for the measurement of stellar positions by means of a meridian circle, then the fundamental instrument of astrometry. An expedition to Perth in Western Australia with the instrument and the computer was a success. This method was also implemented in space in the first ever astrometric satellite Hipparcos launched by ESA in 1989. The Hipparcos results published in 1997 revolutionized astrometry with an impact in all branches of astronomy from the solar system and stellar structure to cosmic distances and the dynamics of the Milky Way. In turn, the results paved the way for a successor, the one million times more powerful Gaia astrometry satellite launched by ESA in 2013. Preparations for a Gaia successor in twenty years are making progress.