Related papers: Distributed Reinforcement Learning of Targeted Grasping with Active Vision for Mobile Manipulators

Distributed Reinforcement Learning of Targeted Grasping with Active Vision for Mobile Manipulators

URL: http://arxiv.org/abs/2007.08082v2
Date: Wed, 14 Oct 2020 08:59:39 GMT
Title: Distributed Reinforcement Learning of Targeted Grasping with Active Vision for Mobile Manipulators
Authors: Yasuhiro Fujita, Kota Uenishi, Avinash Ummadisingu, Prabhat Nagarajan, Shimpei Masuda, and Mario Ynocente Castro
Abstract summary: We present the first RL-based system for a mobile manipulator that can (a) achieve targeted grasping generalizing to unseen target objects, (b) learn complex grasping strategies for cluttered scenes with occluded objects, and (c) perform active vision through its movable wrist camera to better locate objects. We train and evaluate our system in a simulated environment, identify key components for improving performance, analyze its behaviors, and transfer to a real-world setup.
Score: 4.317864702902075
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Developing personal robots that can perform a diverse range of manipulation tasks in unstructured environments necessitates solving several challenges for robotic grasping systems. We take a step towards this broader goal by presenting the first RL-based system, to our knowledge, for a mobile manipulator that can (a) achieve targeted grasping generalizing to unseen target objects, (b) learn complex grasping strategies for cluttered scenes with occluded objects, and (c) perform active vision through its movable wrist camera to better locate objects. The system is informed of the desired target object in the form of a single, arbitrary-pose RGB image of that object, enabling the system to generalize to unseen objects without retraining. To achieve such a system, we combine several advances in deep reinforcement learning and present a large-scale distributed training system using synchronous SGD that seamlessly scales to multi-node, multi-GPU infrastructure to make rapid prototyping easier. We train and evaluate our system in a simulated environment, identify key components for improving performance, analyze its behaviors, and transfer to a real-world setup.

Related papers

RoboGround: Robotic Manipulation with Grounded Vision-Language Priors [59.31993241876335]
In this work, we explore grounding masks as an effective intermediate representation. We introduce RoboGround, a grounding-aware robotic manipulation system. To further explore and enhance generalization, we propose an automated pipeline for generating large-scale, simulated data.
arXiv Detail & Related papers (2025-04-30T11:26:40Z)
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning [67.72413262980272]
Pre-trained vision models (PVMs) are fundamental to modern robotics, yet their optimal configuration remains unclear. We develop SlotMIM, a method that induces object-centric representations by introducing a semantic bottleneck. Our approach achieves significant improvements over prior work in image recognition, scene understanding, and robot learning evaluations.
arXiv Detail & Related papers (2025-03-10T06:18:31Z)
Object and Contact Point Tracking in Demonstrations Using 3D Gaussian Splatting [17.03927416536173]
This paper introduces a method to enhance Interactive Imitation Learning (IIL) by extracting touch interaction points and tracking object movement from video demonstrations. The approach extends current IIL systems by providing robots with detailed knowledge of both where and how to interact with objects, particularly complex articulated ones like doors and drawers.
arXiv Detail & Related papers (2024-11-05T23:28:57Z)
Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies. Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors. We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z)
Cognitive Planning for Object Goal Navigation using Generative AI Models [0.979851640406258]
We present a novel framework for solving the object goal navigation problem that generates efficient exploration strategies. Our approach enables a robot to navigate unfamiliar environments by leveraging Large Language Models (LLMs) and Large Vision-Language Models (LVLMs)
arXiv Detail & Related papers (2024-03-30T10:54:59Z)
Modular Neural Network Policies for Learning In-Flight Object Catching with a Robot Hand-Arm System [55.94648383147838]
We present a modular framework designed to enable a robot hand-arm system to learn how to catch flying objects. Our framework consists of five core modules: (i) an object state estimator that learns object trajectory prediction, (ii) a catching pose quality network that learns to score and rank object poses for catching, (iii) a reaching control policy trained to move the robot hand to pre-catch poses, and (iv) a grasping control policy trained to perform soft catching motions. We conduct extensive evaluations of our framework in simulation for each module and the integrated system, to demonstrate high success rates of in-flight
arXiv Detail & Related papers (2023-12-21T16:20:12Z)
Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs [53.66070434419739]
Generalizable articulated object manipulation is essential for home-assistant robots. We propose a kinematic-aware prompting framework that prompts Large Language Models with kinematic knowledge of objects to generate low-level motion waypoints. Our framework outperforms traditional methods on 8 categories seen and shows a powerful zero-shot capability for 8 unseen articulated object categories.
arXiv Detail & Related papers (2023-11-06T03:26:41Z)
Dexterous Manipulation from Images: Autonomous Real-World RL via Substep Guidance [71.36749876465618]
We describe a system for vision-based dexterous manipulation that provides a "programming-free" approach for users to define new tasks. Our system includes a framework for users to define a final task and intermediate sub-tasks with image examples. experimental results with a four-finger robotic hand learning multi-stage object manipulation tasks directly in the real world.
arXiv Detail & Related papers (2022-12-19T22:50:40Z)
Efficient and Robust Training of Dense Object Nets for Multi-Object Robot Manipulation [8.321536457963655]
We propose a framework for robust and efficient training of Dense Object Nets (DON) We focus on training with multi-object data instead of singulated objects, combined with a well-chosen augmentation scheme. We demonstrate the robustness and accuracy of our proposed framework on a real-world robotic grasping task.
arXiv Detail & Related papers (2022-06-24T08:24:42Z)
V-MAO: Generative Modeling for Multi-Arm Manipulation of Articulated Objects [51.79035249464852]
We present a framework for learning multi-arm manipulation of articulated objects. Our framework includes a variational generative model that learns contact point distribution over object rigid parts for each robot arm.
arXiv Detail & Related papers (2021-11-07T02:31:09Z)
Long-Horizon Manipulation of Unknown Objects via Task and Motion Planning with Estimated Affordances [26.082034134908785]
We show that a task-and-motion planner can be used to plan intelligent behaviors even in the absence of a priori knowledge regarding the set of manipulable objects. We demonstrate that this strategy can enable a single system to perform a wide variety of real-world multi-step manipulation tasks.
arXiv Detail & Related papers (2021-08-09T16:13:47Z)
MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale [103.7609761511652]
We show how a large-scale collective robotic learning system can acquire a repertoire of behaviors simultaneously. New tasks can be continuously instantiated from previously learned tasks. We train and evaluate our system on a set of 12 real-world tasks with data collected from 7 robots.
arXiv Detail & Related papers (2021-04-16T16:38:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.