Channels
Open-source NVMe sanitization framework with 50 cycles and per-cycle Log Page 0x81 hardware confirmation. PDF Certificate of Destruction. Designed to align with NIST SP 800-88 Rev.2. Free. - yonasa...
We recently presented a paper at ACM CAIS 2026 on safety evaluation for tool-using LLM agents. The core issue is that task completion alone can be misleading: an agent may complete a task while violating a safety or policy constraint. We separate outcomes into safe success, unsafe success, and failure, and study how verification changes this tradeoff. We evaluate this using τ-bench / Tau-bench tool-use scenarios and propose a two-tier verification architecture: deterministic policy/tool checks first, followed by an LLM-based verifier for more contextual safety cases. The main finding is that verification can reduce unsafe success, but it can also reduce task completion as the task horizon increases. This creates what we call the Verifier Tax: a horizon-dependent safety–success tradeoff in tool-using agents. Paper: https://dl.acm.org/doi/full/10.1145/3786335.3813160 Curious how others think agent evaluations should report unsafe success. Should unsafe completion be counted as success, failure, or a separate category? submitted by /u/AccomplishedLeg1508 [link] [Kommentare]
Verifiable digital identities and secure login for AI agents. One API call. No redirects.
A Danish computer, GIER, from 1961 played a vital role in the development of a new method for astrometric measurement. This method, photon counting astrometry, ultimately led to two satellites with a significant role in the modern revolution of astronomy. A GIER was installed at the Hamburg Observatory in 1964 where it was used to implement the entirely new method for the measurement of stellar positions by means of a meridian circle, then the fundamental instrument of astrometry. An expedition to Perth in Western Australia with the instrument and the computer was a success. This method was also implemented in space in the first ever astrometric satellite Hipparcos launched by ESA in 1989. The Hipparcos results published in 1997 revolutionized astrometry with an impact in all branches of astronomy from the solar system and stellar structure to cosmic distances and the dynamics of the Milky Way. In turn, the results paved the way for a successor, the one million times more powerful Gaia astrometry satellite launched by ESA in 2013. Preparations for a Gaia successor in twenty years are making progress.
ClawMoat is the agent seatbelt, runtime security for desktop AI agents running on your real machine.
Big Kahuna