Object Manipulation via Visual Target Localization
- URL: http://arxiv.org/abs/2203.08141v1
- Date: Tue, 15 Mar 2022 17:59:01 GMT
- Title: Object Manipulation via Visual Target Localization
- Authors: Kiana Ehsani, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi
- Abstract summary: Training agents to manipulate objects, poses many challenges.
We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible.
Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
- Score: 64.05939029132394
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Object manipulation is a critical skill required for Embodied AI agents
interacting with the world around them. Training agents to manipulate objects,
poses many challenges. These include occlusion of the target object by the
agent's arm, noisy object detection and localization, and the target frequently
going out of view as the agent moves around in the scene. We propose
Manipulation via Visual Object Location Estimation (m-VOLE), an approach that
explores the environment in search for target objects, computes their 3D
coordinates once they are located, and then continues to estimate their 3D
locations even when the objects are not visible, thus robustly aiding the task
of manipulating these objects throughout the episode. Our evaluations show a
massive 3x improvement in success rate over a model that has access to the same
sensory suite but is trained without the object location estimator, and our
analysis shows that our agent is robust to noise in depth perception and agent
localization. Importantly, our proposed approach relaxes several assumptions
about idealized localization and perception that are commonly employed by
recent works in embodied AI -- an important step towards training agents for
object manipulation in the real world.
Related papers
- PickScan: Object discovery and reconstruction from handheld interactions [99.99566882133179]
We develop an interaction-guided and class-agnostic method to reconstruct 3D representations of scenes.
Our main contribution is a novel approach to detecting user-object interactions and extracting the masks of manipulated objects.
Compared to Co-Fusion, the only comparable interaction-based and class-agnostic baseline, this corresponds to a reduction in chamfer distance of 73%.
arXiv Detail & Related papers (2024-11-17T23:09:08Z) - Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking [59.87033229815062]
Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered.
Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics.
We present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds.
arXiv Detail & Related papers (2024-09-24T17:59:56Z) - LocaliseBot: Multi-view 3D object localisation with differentiable
rendering for robot grasping [9.690844449175948]
We focus on object pose estimation.
Our approach relies on three pieces of information: multiple views of the object, the camera's parameters at those viewpoints, and 3D CAD models of objects.
We show that the estimated object pose results in 99.65% grasp accuracy with the ground truth grasp candidates.
arXiv Detail & Related papers (2023-11-14T14:27:53Z) - Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - You Only Look at One: Category-Level Object Representations for Pose
Estimation From a Single Example [26.866356430469757]
We present a method for achieving category-level pose estimation by inspection of just a single object from a desired category.
We demonstrate that our method runs in real-time, enabling a robot manipulator equipped with an RGBD sensor to perform online 6D pose estimation for novel objects.
arXiv Detail & Related papers (2023-05-22T01:32:24Z) - SafePicking: Learning Safe Object Extraction via Object-Level Mapping [19.502587411252946]
We present a system, SafePicking, that integrates object-level mapping and learning-based motion planning.
Planning is done by learning a deep Q-network that receives observations of predicted poses and a depth-based heightmap to output a motion trajectory.
Our results show that the observation fusion of poses and depth-sensing gives both better performance and robustness to the model.
arXiv Detail & Related papers (2022-02-11T18:55:10Z) - Analysis of voxel-based 3D object detection methods efficiency for
real-time embedded systems [93.73198973454944]
Two popular voxel-based 3D object detection methods are studied in this paper.
Our experiments show that these methods mostly fail to detect distant small objects due to the sparsity of the input point clouds at large distances.
Our findings suggest that a considerable part of the computations of existing methods is focused on locations of the scene that do not contribute with successful detection.
arXiv Detail & Related papers (2021-05-21T12:40:59Z) - Supervised Training of Dense Object Nets using Optimal Descriptors for
Industrial Robotic Applications [57.87136703404356]
Dense Object Nets (DONs) by Florence, Manuelli and Tedrake introduced dense object descriptors as a novel visual object representation for the robotics community.
In this paper we show that given a 3D model of an object, we can generate its descriptor space image, which allows for supervised training of DONs.
We compare the training methods on generating 6D grasps for industrial objects and show that our novel supervised training approach improves the pick-and-place performance in industry-relevant tasks.
arXiv Detail & Related papers (2021-02-16T11:40:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.