UniTeam: Open Vocabulary Mobile Manipulation Challenge
- URL: http://arxiv.org/abs/2312.08611v1
- Date: Thu, 14 Dec 2023 02:24:29 GMT
- Title: UniTeam: Open Vocabulary Mobile Manipulation Challenge
- Authors: Andrew Melnik, Michael B\"uttner, Leon Harz, Lyon Brown, Gora Chand
Nandi, Arjun PS, Gaurav Kumar Yadav, Rahul Kala, Robert Haschke
- Abstract summary: This report introduces our UniTeam agent - an improved baseline for the "HomeRobot: Open Vocabulary Mobile Manipulation" challenge.
The challenge poses problems of navigation in unfamiliar environments, manipulation of novel objects, and recognition of open-vocabulary object classes.
This challenge aims to facilitate cross-cutting research in embodied AI using recent advances in machine learning, computer vision, natural language, and robotics.
- Score: 4.523096223190858
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This report introduces our UniTeam agent - an improved baseline for the
"HomeRobot: Open Vocabulary Mobile Manipulation" challenge. The challenge poses
problems of navigation in unfamiliar environments, manipulation of novel
objects, and recognition of open-vocabulary object classes. This challenge aims
to facilitate cross-cutting research in embodied AI using recent advances in
machine learning, computer vision, natural language, and robotics. In this
work, we conducted an exhaustive evaluation of the provided baseline agent;
identified deficiencies in perception, navigation, and manipulation skills; and
improved the baseline agent's performance. Notably, enhancements were made in
perception - minimizing misclassifications; navigation - preventing infinite
loop commitments; picking - addressing failures due to changing object
visibility; and placing - ensuring accurate positioning for successful object
placement.
Related papers
- Latent Object Characteristics Recognition with Visual to Haptic-Audio Cross-modal Transfer Learning [9.178588671620963]
This work aims to recognise the latent unobservable object characteristics.
vision is commonly used for object recognition by robots, but it is ineffective for detecting hidden objects.
We propose a cross-modal transfer learning approach from vision to haptic-audio.
arXiv Detail & Related papers (2024-03-15T21:18:14Z) - MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting [97.52388851329667]
We introduce Marking Open-world Keypoint Affordances (MOKA) to solve robotic manipulation tasks specified by free-form language instructions.
Central to our approach is a compact point-based representation of affordance, which bridges the VLM's predictions on observed images and the robot's actions in the physical world.
We evaluate and analyze MOKA's performance on various table-top manipulation tasks including tool use, deformable body manipulation, and object rearrangement.
arXiv Detail & Related papers (2024-03-05T18:08:45Z) - Lifelong Change Detection: Continuous Domain Adaptation for Small Object
Change Detection in Every Robot Navigation [5.8010446129208155]
Ground view change detection suffers from its ill-posed-ness because of visual uncertainty combined with complex nonlinear perspective projection.
To regularize the ill-posed-ness, the commonly applied supervised learning methods rely on manually annotated high-quality object-class-specific priors.
The present approach adopts the powerful and versatile idea that object changes detected during everyday robot navigation can be reused as additional priors to improve future change detection tasks.
arXiv Detail & Related papers (2023-06-28T10:34:59Z) - HomeRobot: Open-Vocabulary Mobile Manipulation [107.05702777141178]
Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location.
HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, providing a software stack for the low-cost Hello Robot Stretch.
arXiv Detail & Related papers (2023-06-20T14:30:32Z) - Hierarchical Few-Shot Imitation with Skill Transition Models [66.81252581083199]
Few-shot Imitation with Skill Transition Models (FIST) is an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks.
We show that FIST is capable of generalizing to new tasks and substantially outperforms prior baselines in navigation experiments.
arXiv Detail & Related papers (2021-07-19T15:56:01Z) - Deep Learning for Embodied Vision Navigation: A Survey [108.13766213265069]
"Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation.
This paper attempts to establish an outline of the current works in the field of embodied visual navigation by providing a comprehensive literature survey.
arXiv Detail & Related papers (2021-07-07T12:09:04Z) - ManipulaTHOR: A Framework for Visual Object Manipulation [27.17908758246059]
We propose a framework for object manipulation built upon the physics-enabled, visually rich AI2-THOR framework.
This task extends the popular point navigation task to object manipulation and offers new challenges including 3D obstacle avoidance.
arXiv Detail & Related papers (2021-04-22T17:49:04Z) - Diagnosing Vision-and-Language Navigation: What Really Matters [61.72935815656582]
Vision-and-language navigation (VLN) is a multimodal task where an agent follows natural language instructions and navigates in visual environments.
Recent studies witness a slow-down in the performance improvements in both indoor and outdoor VLN tasks.
In this work, we conduct a series of diagnostic experiments to unveil agents' focus during navigation.
arXiv Detail & Related papers (2021-03-30T17:59:07Z) - Vision-Based Mobile Robotics Obstacle Avoidance With Deep Reinforcement
Learning [49.04274612323564]
Obstacle avoidance is a fundamental and challenging problem for autonomous navigation of mobile robots.
In this paper, we consider the problem of obstacle avoidance in simple 3D environments where the robot has to solely rely on a single monocular camera.
We tackle the obstacle avoidance problem as a data-driven end-to-end deep learning approach.
arXiv Detail & Related papers (2021-03-08T13:05:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.