InFeeo
Global
All
New
Language
Profile channel

@Jimbo

No bio yet.

Since 06.06.2026

ICRA'27(reddit.com)
Hello Everyone, I am preparing my manuscript for upcoming ICRA'27. But They have page limit of 8 pages, including references. As well, They don't accept any supplementary documents. So my question is how can I show more experiments and ablation studies? Because 8 page is not sufficient. Any tips? Much appreciated!! submitted by /u/selim_sarowar [link] [Kommentare]
I built a leakage-clean verifier for robot manipulation, is this useful? Am I solving a non-problem? [D](reddit.com)
Spent the last few weeks on a benchmark/harness that tries to answer one question honestly: did a robot arm actually do the demonstrated task, or did the success metric just get fooled? The setup: compile a human demo into an object-centric graph (what changed in the world: relations, contacts, event order), run a solver, then independently extract a graph from the rollout only and check if they match. The whole point is a hard information boundary so the "answer key" can never leak into the side that grades the rollout. A no-op baseline fails with named failure classes; a dumb scripted arm passes. That contrast is the thing I care about. Most manipulation success metrics are hand-coded predicates written by the same person training the policy. The policy author controls both the behavior and the definition of "success." That's a conflict of interest we'd never accept in ML benchmarking, yet it's standard in manipulation eval. But I keep going back and forth on whether this matters, and I'd like other people's read: The case that it's real: VLA/foundation-model training is starved for reliable dense reward at scale. Human raters don't scale, brittle predicates lie. An automatic, embodiment-agnostic grader that can say "this rollout reproduced the demonstrated transformation, here's why it failed" seems like an obviously-missing piece of the training loop. The case that it's a non-problem: maybe everyone's already fine with task-specific success checks because in practice you only care about the tasks you're shipping, and a general verifier is solving for a generality nobody needs. And the representation that makes verification tractable (discrete relational state — INSIDE/TOUCHING/event-order) is also what caps it: it handles pick/place/insert/open-drawer but has no obvious purchase on force-profile or deformable tasks, which is exactly where the frontier is. There's also the uncomfortable bit: the hard 80% is perception (video → graph under occlusion and contact noise), and that's where the leakage discipline gets harder, not easier, because your extractor is now a learned, error-prone thing. Two questions I don't have a settled answer on: Is reward/eval honesty a first-order bottleneck for the current generation of manipulation learning, or second-order polish? Is object-centric relational state a dead representation for where manipulation is actually going, or a reasonable floor you build up from? submitted by /u/Alexpplay [link] [Kommentare]