A Simple Approach for Visual Rearrangement: 3D Mapping and Semantic
Search
- URL: http://arxiv.org/abs/2206.13396v1
- Date: Tue, 21 Jun 2022 02:33:57 GMT
- Title: A Simple Approach for Visual Rearrangement: 3D Mapping and Semantic
Search
- Authors: Brandon Trabucco, Gunnar Sigurdsson, Robinson Piramuthu, Gaurav S.
Sukhatme, Ruslan Salakhutdinov
- Abstract summary: Visual room rearrangement evaluates an agent's ability to rearrange objects based solely on visual input.
We propose a simple yet effective method for this problem: (1) search for and map which objects need to be rearranged, and (2) rearrange each object until the task is complete.
- Score: 71.14527779661181
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Physically rearranging objects is an important capability for embodied
agents. Visual room rearrangement evaluates an agent's ability to rearrange
objects in a room to a desired goal based solely on visual input. We propose a
simple yet effective method for this problem: (1) search for and map which
objects need to be rearranged, and (2) rearrange each object until the task is
complete. Our approach consists of an off-the-shelf semantic segmentation
model, voxel-based semantic map, and semantic search policy to efficiently find
objects that need to be rearranged. On the AI2-THOR Rearrangement Challenge,
our method improves on current state-of-the-art end-to-end reinforcement
learning-based methods that learn visual rearrangement policies from 0.53%
correct rearrangement to 16.56%, using only 2.7% as many samples from the
environment.
Related papers
- PickScan: Object discovery and reconstruction from handheld interactions [99.99566882133179]
We develop an interaction-guided and class-agnostic method to reconstruct 3D representations of scenes.
Our main contribution is a novel approach to detecting user-object interactions and extracting the masks of manipulated objects.
Compared to Co-Fusion, the only comparable interaction-based and class-agnostic baseline, this corresponds to a reduction in chamfer distance of 73%.
arXiv Detail & Related papers (2024-11-17T23:09:08Z) - ICGNet: A Unified Approach for Instance-Centric Grasping [42.92991092305974]
We introduce an end-to-end architecture for object-centric grasping.
We show the effectiveness of the proposed method by extensively evaluating it against state-of-the-art methods on synthetic datasets.
arXiv Detail & Related papers (2024-01-18T12:41:41Z) - Probable Object Location (POLo) Score Estimation for Efficient Object
Goal Navigation [15.623723522165731]
We introduce a novel framework centered around the Probable Object Location (POLo) score.
We further enhance the framework's practicality by introducing POLoNet, a neural network trained to approximate the computationally intensive POLo score.
Our experiments, involving the first phase of the OVMM 2023 challenge, demonstrate that an agent equipped with POLoNet significantly outperforms a range of baseline methods.
arXiv Detail & Related papers (2023-11-14T08:45:32Z) - Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - Learning-based Relational Object Matching Across Views [63.63338392484501]
We propose a learning-based approach which combines local keypoints with novel object-level features for matching object detections between RGB images.
We train our object-level matching features based on appearance and inter-frame and cross-frame spatial relations between objects in an associative graph neural network.
arXiv Detail & Related papers (2023-05-03T19:36:51Z) - ReorientDiff: Diffusion Model based Reorientation for Object
Manipulation [18.95498618397922]
The ability to manipulate objects in a desired configurations is a fundamental requirement for robots to complete various practical applications.
We propose a reorientation planning method, ReorientDiff, that utilizes a diffusion model-based approach.
The proposed method is evaluated using a set of YCB-objects and a suction gripper, demonstrating a success rate of 95.2% in simulation.
arXiv Detail & Related papers (2023-02-28T00:08:38Z) - Fusing Local Similarities for Retrieval-based 3D Orientation Estimation
of Unseen Objects [70.49392581592089]
We tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images.
We follow a retrieval-based strategy and prevent the network from learning object-specific features.
Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.
arXiv Detail & Related papers (2022-03-16T08:53:00Z) - Tasks Integrated Networks: Joint Detection and Retrieval for Image
Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated.
We first introduce an end-to-end Integrated Net (I-Net), which has three merits.
We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z) - Multi-Resolution POMDP Planning for Multi-Object Search in 3D [26.683481431467783]
We present a POMDP formulation for multi-object search in a 3D region with a frustum-shaped field-of-view.
We design a novel octree-based belief representation to capture uncertainty of the target objects at different resolution levels.
We demonstrate our approach on a mobile robot to find objects placed at different heights in two 10m$2 times 2$m regions by moving its base and actuating its torso.
arXiv Detail & Related papers (2020-05-06T14:54:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.