InFeeo
Language

Opus 4.8 ARC-AGI-3 Replay(reddit.com)

×
Link preview Opus 4.8 ARC-AGI-3 Replay https://reddit.com/link/1ty3xhz/video/dzede49lhk5h1/player Link to the replay. What are everyone’s thoughts on this? I know the benchmark has gotten a lot of criticism for being “too difficult” from a scoring perspective, but after watching the replay, it honestly looks like the models just aren’t that close to solving it yet. I’m not saying the benchmark is perfect, but the failures don’t really look like minor scoring issues. They look more like the model still doesn’t understand the task well enough to complete it reliably. submitted by /u/ClickedMoss5 [link] [Kommentare] reddit.com · reddit.com
https://reddit.com/link/1ty3xhz/video/dzede49lhk5h1/player Link to the replay. What are everyone’s thoughts on this? I know the benchmark has gotten a lot of criticism for being “too difficult” from a scoring perspective, but after watching the replay, it honestly looks like the models just aren’t that close to solving it yet. I’m not saying the benchmark is perfect, but the failures don’t really look like minor scoring issues. They look more like the model still doesn’t understand the task well enough to complete it reliably. submitted by /u/ClickedMoss5 [link] [Kommentare]

Comments

Log in Log in to comment.

No comments yet.