dARt Vinci: Egocentric Data Collection for Surgical Robot Learning at Scale
- URL: http://arxiv.org/abs/2503.05646v1
- Date: Fri, 07 Mar 2025 18:07:54 GMT
- Title: dARt Vinci: Egocentric Data Collection for Surgical Robot Learning at Scale
- Authors: Yihao Liu, Yu-Chun Ku, Jiaming Zhang, Hao Ding, Peter Kazanzides, Mehran Armand,
- Abstract summary: We introduce dARt Vinci, a scalable data collection platform for robot learning in surgical settings.<n>The system uses Augmented Reality (AR) hand tracking and a high-fidelity physics engine to capture subtle maneuvers in primitive surgical tasks.
- Score: 9.732963094538837
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Data scarcity has long been an issue in the robot learning community. Particularly, in safety-critical domains like surgical applications, obtaining high-quality data can be especially difficult. It poses challenges to researchers seeking to exploit recent advancements in reinforcement learning and imitation learning, which have greatly improved generalizability and enabled robots to conduct tasks autonomously. We introduce dARt Vinci, a scalable data collection platform for robot learning in surgical settings. The system uses Augmented Reality (AR) hand tracking and a high-fidelity physics engine to capture subtle maneuvers in primitive surgical tasks: By eliminating the need for a physical robot setup and providing flexibility in terms of time, space, and hardware resources-such as multiview sensors and actuators-specialized simulation is a viable alternative. At the same time, AR allows the robot data collection to be more egocentric, supported by its body tracking and content overlaying capabilities. Our user study confirms the proposed system's efficiency and usability, where we use widely-used primitive tasks for training teleoperation with da Vinci surgical robots. Data throughput improves across all tasks compared to real robot settings by 41% on average. The total experiment time is reduced by an average of 10%. The temporal demand in the task load survey is improved. These gains are statistically significant. Additionally, the collected data is over 400 times smaller in size, requiring far less storage while achieving double the frequency.
Related papers
- Data-Efficient Learning from Human Interventions for Mobile Robots [46.65860995185883]
Mobile robots are essential in applications such as autonomous delivery and hospitality services.<n>Applying learning-based methods to address mobile robot tasks has gained popularity due to its robustness and generalizability.<n>Traditional methods such as Imitation Learning (IL) and Reinforcement Learning (RL) offer adaptability but require large datasets, carefully crafted reward functions, and face sim-to-real gaps.<n>We propose an online human-in-the-loop learning method PVP4Real that combines IL and RL to address these issues.
arXiv Detail & Related papers (2025-03-06T21:02:02Z) - Towards General Purpose Robots at Scale: Lifelong Learning and Learning to Use Memory [0.0]
This thesis focuses on addressing two key challenges for robots operating over long time horizons: memory and lifelong learning.<n>First, we introduce t-DGR, a trajectory-based deep generative replay method that achieves state-of-the-art performance on Continual World benchmarks.<n>Second, we develop a framework that leverages human demonstrations to teach agents effective memory utilization.
arXiv Detail & Related papers (2024-12-28T21:13:48Z) - Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets [24.77850617214567]
We propose a foundation representation learning framework capturing both visual features and the dynamics information such as actions and proprioceptions of manipulation tasks.
Specifically, we pre-train a visual encoder on the DROID robotic dataset and leverage motion-relevant data such as robot proprioceptive states and actions.
We introduce a novel contrastive loss that aligns visual observations with the robot's proprioceptive state-action dynamics, combined with a behavior cloning (BC)-like actor loss to predict actions during pre-training, along with a time contrastive loss.
arXiv Detail & Related papers (2024-10-29T17:58:13Z) - Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition [48.65867987106428]
We introduce a novel system for joint learning between human operators and robots.
It enables human operators to share control of a robot end-effector with a learned assistive agent.
It reduces the need for human adaptation while ensuring the collected data is of sufficient quality for downstream tasks.
arXiv Detail & Related papers (2024-06-29T03:37:29Z) - Surgical Gym: A high-performance GPU-based platform for reinforcement
learning with surgical robots [1.6415802723978308]
We introduce Surgical Gym, an open-source high performance platform for surgical robot learning.
We demonstrate between 100-5000x faster training times compared with previous surgical learning platforms.
arXiv Detail & Related papers (2023-10-07T03:21:58Z) - AR2-D2:Training a Robot Without a Robot [53.10633639596096]
We introduce AR2-D2, a system for collecting demonstrations which does not require people with specialized training.
AR2-D2 is a framework in the form of an iOS app that people can use to record a video of themselves manipulating any object.
We show that data collected via our system enables the training of behavior cloning agents in manipulating real objects.
arXiv Detail & Related papers (2023-06-23T23:54:26Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Scaling Robot Learning with Semantically Imagined Experience [21.361979238427722]
Recent advances in robot learning have shown promise in enabling robots to perform manipulation tasks.
One of the key contributing factors to this progress is the scale of robot data used to train the models.
We propose an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing.
arXiv Detail & Related papers (2023-02-22T18:47:51Z) - Lifelong Robotic Reinforcement Learning by Retaining Experiences [61.79346922421323]
Many multi-task reinforcement learning efforts assume the robot can collect data from all tasks at all times.
In this work, we study a practical sequential multi-task RL problem motivated by the practical constraints of physical robotic systems.
We derive an approach that effectively leverages the data and policies learned for previous tasks to cumulatively grow the robot's skill-set.
arXiv Detail & Related papers (2021-09-19T18:00:51Z) - Where is my hand? Deep hand segmentation for visual self-recognition in
humanoid robots [129.46920552019247]
We propose the use of a Convolution Neural Network (CNN) to segment the robot hand from an image in an egocentric view.
We fine-tuned the Mask-RCNN network for the specific task of segmenting the hand of the humanoid robot Vizzy.
arXiv Detail & Related papers (2021-02-09T10:34:32Z) - Task-relevant Representation Learning for Networked Robotic Perception [74.0215744125845]
This paper presents an algorithm to learn task-relevant representations of sensory data that are co-designed with a pre-trained robotic perception model's ultimate objective.
Our algorithm aggressively compresses robotic sensory data by up to 11x more than competing methods.
arXiv Detail & Related papers (2020-11-06T07:39:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.