TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors
- URL: http://arxiv.org/abs/2207.10761v1
- Date: Thu, 21 Jul 2022 21:19:18 GMT
- Title: TIDEE: Tidying Up Novel Rooms using Visuo-Semantic Commonsense Priors
- Authors: Gabriel Sarch, Zhaoyuan Fang, Adam W. Harley, Paul Schydlo, Michael J.
Tarr, Saurabh Gupta, and Katerina Fragkiadaki
- Abstract summary: TIDEE tidies up a disordered scene based on learned commonsense object placement and room arrangement priors.
TIDEE explores a home environment, detects objects that are out of their natural place, infers plausible object contexts for them, localizes such contexts in the current scene, and repositions the objects.
We test TIDEE on tidying up disorganized scenes in the AI2THOR simulation environment.
- Score: 29.255373211228548
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce TIDEE, an embodied agent that tidies up a disordered scene based
on learned commonsense object placement and room arrangement priors. TIDEE
explores a home environment, detects objects that are out of their natural
place, infers plausible object contexts for them, localizes such contexts in
the current scene, and repositions the objects. Commonsense priors are encoded
in three modules: i) visuo-semantic detectors that detect out-of-place objects,
ii) an associative neural graph memory of objects and spatial relations that
proposes plausible semantic receptacles and surfaces for object repositions,
and iii) a visual search network that guides the agent's exploration for
efficiently localizing the receptacle-of-interest in the current scene to
reposition the object. We test TIDEE on tidying up disorganized scenes in the
AI2THOR simulation environment. TIDEE carries out the task directly from pixel
and raw depth input without ever having observed the same room beforehand,
relying only on priors learned from a separate set of training houses. Human
evaluations on the resulting room reorganizations show TIDEE outperforms
ablative versions of the model that do not use one or more of the commonsense
priors. On a related room rearrangement benchmark that allows the agent to view
the goal state prior to rearrangement, a simplified version of our model
significantly outperforms a top-performing method by a large margin. Code and
data are available at the project website: https://tidee-agent.github.io/.
Related papers
- A Modern Take on Visual Relationship Reasoning for Grasp Planning [10.543168383800532]
We present a modern take on visual relational reasoning for grasp planning.
We introduce D3GD, a novel testbed that includes bin picking scenes with up to 35 objects from 97 distinct categories.
We also propose D3G, a new end-to-end transformer-based dependency graph generation model.
arXiv Detail & Related papers (2024-09-03T16:30:48Z) - PatchContrast: Self-Supervised Pre-training for 3D Object Detection [14.603858163158625]
We introduce PatchContrast, a novel self-supervised point cloud pre-training framework for 3D object detection.
We show that our method outperforms existing state-of-the-art models on three commonly-used 3D detection datasets.
arXiv Detail & Related papers (2023-08-14T07:45:54Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Robust Change Detection Based on Neural Descriptor Fields [53.111397800478294]
We develop an object-level online change detection approach that is robust to partially overlapping observations and noisy localization results.
By associating objects via shape code similarity and comparing local object-neighbor spatial layout, our proposed approach demonstrates robustness to low observation overlap and localization noises.
arXiv Detail & Related papers (2022-08-01T17:45:36Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Fusing Local Similarities for Retrieval-based 3D Orientation Estimation
of Unseen Objects [70.49392581592089]
We tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images.
We follow a retrieval-based strategy and prevent the network from learning object-specific features.
Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.
arXiv Detail & Related papers (2022-03-16T08:53:00Z) - SORNet: Spatial Object-Centric Representations for Sequential
Manipulation [39.88239245446054]
Sequential manipulation tasks require a robot to perceive the state of an environment and plan a sequence of actions leading to a desired goal state.
We propose SORNet, which extracts object-centric representations from RGB images conditioned on canonical views of the objects of interest.
arXiv Detail & Related papers (2021-09-08T19:36:29Z) - BORM: Bayesian Object Relation Model for Indoor Scene Recognition [3.3274747298291216]
We propose to utilize meaningful object representations for indoor scene representation.
First, we utilize an improved object model (IOM) as a baseline that enriches the object knowledge by introducing a scene parsing algorithm pretrained on the ADE20K dataset with rich object categories related to the indoor scene.
To analyze the object co-occurrences and pairwise object relations, we formulate the IOM from a Bayesian perspective as the Bayesian object relation model (BORM)
arXiv Detail & Related papers (2021-08-01T08:31:18Z) - Robust Object Detection via Instance-Level Temporal Cycle Confusion [89.1027433760578]
We study the effectiveness of auxiliary self-supervised tasks to improve the out-of-distribution generalization of object detectors.
Inspired by the principle of maximum entropy, we introduce a novel self-supervised task, instance-level temporal cycle confusion (CycConf)
For each object, the task is to find the most different object proposals in the adjacent frame in a video and then cycle back to itself for self-supervision.
arXiv Detail & Related papers (2021-04-16T21:35:08Z) - Object Priors for Classifying and Localizing Unseen Actions [45.91275361696107]
We propose three spatial object priors, which encode local person and object detectors along with their spatial relations.
On top we introduce three semantic object priors, which extend semantic matching through word embeddings.
A video embedding combines the spatial and semantic object priors.
arXiv Detail & Related papers (2021-04-10T08:56:58Z) - Synthesizing the Unseen for Zero-shot Object Detection [72.38031440014463]
We propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain.
We use a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them.
arXiv Detail & Related papers (2020-10-19T12:36:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.