Related papers: RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation

RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation

URL: http://arxiv.org/abs/2507.00435v1
Date: Tue, 01 Jul 2025 05:33:16 GMT
Title: RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation
Authors: Yi Ru Wang, Carter Ung, Grant Tannert, Jiafei Duan, Josephine Li, Amy Le, Rishabh Oswal, Markus Grotz, Wilbert Pumacay, Yuquan Deng, Ranjay Krishna, Dieter Fox, Siddhartha Srinivasa,
Abstract summary: We present RoboEval, a simulation benchmark and structured evaluation framework designed to reveal the limitations of current bimanual manipulation policies.<n>RoboEval introduces a suite of tiered, semantically grounded tasks that systematically challenge spatial, physical, and coordination capabilities.<n>We find that behavioral metrics correlate with success in over half of task-metric pairs, and remain informative even when binary success saturates.
Score: 32.080769025457926
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present RoboEval, a simulation benchmark and structured evaluation framework designed to reveal the limitations of current bimanual manipulation policies. While prior benchmarks report only binary task success, we show that such metrics often conceal critical weaknesses in policy behavior -- such as poor coordination, slipping during grasping, or asymmetric arm usage. RoboEval introduces a suite of tiered, semantically grounded tasks decomposed into skill-specific stages, with variations that systematically challenge spatial, physical, and coordination capabilities. Tasks are paired with fine-grained diagnostic metrics and 3000+ human demonstrations to support imitation learning. Our experiments reveal that policies with similar success rates diverge in how tasks are executed -- some struggle with alignment, others with temporally consistent bimanual control. We find that behavioral metrics correlate with success in over half of task-metric pairs, and remain informative even when binary success saturates. By pinpointing when and how policies fail, RoboEval enables a deeper, more actionable understanding of robotic manipulation -- and highlights the need for evaluation tools that go beyond success alone.

Related papers

Learning to Evaluate Autonomous Behaviour in Human-Robot Interaction [26.765285050287247]
We propose a general evaluation framework that measures the quality of Imitation Learning (IL) methods by focusing on trajectory performance.<n>We devise the Neural Meta Evaluator (NeME), a deep learning model trained to classify actions from robot joint trajectories.<n>We validate our framework on ergoCub, a humanoid robot, using teleoperation data and comparing IL methods tailored to the available platform.
arXiv Detail & Related papers (2025-07-08T21:12:57Z)
Single-Shot Learning of Stable Dynamical Systems for Long-Horizon Manipulation Tasks [48.54757719504994]
This paper focuses on improving task success rates while reducing the amount of training data needed. Our approach introduces a novel method that segments long-horizon demonstrations into discrete steps defined by waypoints and subgoals. We validate our approach through both simulation and real-world experiments, demonstrating effective transfer from simulation to physical robotic platforms.
arXiv Detail & Related papers (2024-10-01T19:49:56Z)
Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills. We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval. GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z)
BiKC: Keypose-Conditioned Consistency Policy for Bimanual Robotic Manipulation [48.08416841005715]
We introduce a novel keypose-conditioned consistency policy tailored for bimanual manipulation. It is a hierarchical imitation learning framework that consists of a high-level keypose predictor and a low-level trajectory generator. Simulated and real-world experimental results demonstrate that the proposed approach surpasses baseline methods in terms of success rate and operational efficiency.
arXiv Detail & Related papers (2024-06-14T14:49:12Z)
Large Language Models for Orchestrating Bimanual Robots [19.60907949776435]
We present LAnguage-model-based Bimanual ORchestration (LABOR) to analyze task configurations and devise coordination control policies. We evaluate our method through simulated experiments involving two classes of long-horizon tasks using the NICOL humanoid robot.
arXiv Detail & Related papers (2024-04-02T15:08:35Z)
Stabilize to Act: Learning to Coordinate for Bimanual Manipulation [24.453468143697723]
We propose a novel role assignment framework for bimanual robotic systems. A stabilizing arm holds an object in place to simplify the environment while an acting arm executes the task. We instantiate this framework with BimanUal Dexterity from Stabilization (BUDS)
arXiv Detail & Related papers (2023-09-03T05:56:21Z)
Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration [68.94506047556412]
We propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration. We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up.
arXiv Detail & Related papers (2022-11-09T10:28:40Z)
Bottom-Up Skill Discovery from Unsegmented Demonstrations for Long-Horizon Robot Manipulation [55.31301153979621]
We tackle real-world long-horizon robot manipulation tasks through skill discovery. We present a bottom-up approach to learning a library of reusable skills from unsegmented demonstrations. Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks.
arXiv Detail & Related papers (2021-09-28T16:18:54Z)
Scalable Multi-Task Imitation Learning with Autonomous Improvement [159.9406205002599]
We build an imitation learning system that can continuously improve through autonomous data collection. We leverage the robot's own trials as demonstrations for tasks other than the one that the robot actually attempted. In contrast to prior imitation learning approaches, our method can autonomously collect data with sparse supervision for continuous improvement.
arXiv Detail & Related papers (2020-02-25T18:56:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.