Related papers: LLM-enhanced Scene Graph Learning for Household Rearrangement

LLM-enhanced Scene Graph Learning for Household Rearrangement

URL: http://arxiv.org/abs/2408.12093v2
Date: Thu, 12 Sep 2024 07:18:00 GMT
Title: LLM-enhanced Scene Graph Learning for Household Rearrangement
Authors: Wenhao Li, Zhiyuan Yu, Qijin She, Zhinan Yu, Yuqing Lan, Chenyang Zhu, Ruizhen Hu, Kai Xu,
Abstract summary: Household rearrangement task involves spotting misplaced objects in a scene and accommodate them with proper places. We propose to mine object functionality with user preference alignment directly from the scene itself. Our method achieves state-of-the-art performance on misplacement detection and the following rearrangement planning.
Score: 28.375701371003107
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The household rearrangement task involves spotting misplaced objects in a scene and accommodate them with proper places. It depends both on common-sense knowledge on the objective side and human user preference on the subjective side. In achieving such task, we propose to mine object functionality with user preference alignment directly from the scene itself, without relying on human intervention. To do so, we work with scene graph representation and propose LLM-enhanced scene graph learning which transforms the input scene graph into an affordance-enhanced graph (AEG) with information-enhanced nodes and newly discovered edges (relations). In AEG, the nodes corresponding to the receptacle objects are augmented with context-induced affordance which encodes what kind of carriable objects can be placed on it. New edges are discovered with newly discovered non-local relations. With AEG, we perform task planning for scene rearrangement by detecting misplaced carriables and determining a proper placement for each of them. We test our method by implementing a tiding robot in simulator and perform evaluation on a new benchmark we build. Extensive evaluations demonstrate that our method achieves state-of-the-art performance on misplacement detection and the following rearrangement planning.

Related papers

ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting [54.92763171355442]
ObjectGS is an object-aware framework that unifies 3D scene reconstruction with semantic understanding.<n>We show through experiments that ObjectGS outperforms state-of-the-art methods on open-vocabulary and panoptic segmentation tasks.
arXiv Detail & Related papers (2025-07-21T10:06:23Z)
Open-Vocabulary Octree-Graph for 3D Scene Understanding [54.11828083068082]
Octree-Graph is a novel scene representation for open-vocabulary 3D scene understanding. An adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape.
arXiv Detail & Related papers (2024-11-25T10:14:10Z)
ICGNet: A Unified Approach for Instance-Centric Grasping [42.92991092305974]
We introduce an end-to-end architecture for object-centric grasping. We show the effectiveness of the proposed method by extensively evaluating it against state-of-the-art methods on synthetic datasets.
arXiv Detail & Related papers (2024-01-18T12:41:41Z)
Open-Vocabulary Object Detection via Scene Graph Discovery [53.27673119360868]
Open-vocabulary (OV) object detection has attracted increasing research attention. We propose a novel Scene-Graph-Based Discovery Network (SGDN) that exploits scene graph cues for OV detection.
arXiv Detail & Related papers (2023-07-07T00:46:19Z)
Task-Driven Graph Attention for Hierarchical Relational Object Navigation [25.571175038938527]
Embodied AI agents in large scenes often need to navigate to find objects. We study a naturally emerging variant of the object navigation task, hierarchical object navigation (HRON) We propose a solution that uses scene graphs as part of its input and integrates graph neural networks as its backbone.
arXiv Detail & Related papers (2023-06-23T19:50:48Z)
Modeling Dynamic Environments with Scene Graph Memory [46.587536843634055]
We present a new type of link prediction problem: link prediction on partially observable dynamic graphs. Our graph is a representation of a scene in which rooms and objects are nodes, and their relationships are encoded in the edges. We propose a novel state representation -- Scene Graph Memory (SGM) -- with captures the agent's accumulated set of observations. We evaluate our method in the Dynamic House Simulator, a new benchmark that creates diverse dynamic graphs following the semantic patterns typically seen at homes.
arXiv Detail & Related papers (2023-05-27T17:39:38Z)
Location-Free Scene Graph Generation [45.366540803729386]
Scene Graph Generation (SGG) is a visual understanding task, aiming to describe a scene as a graph of entities and their relationships with each other. Existing works rely on location labels in form of bounding boxes or segmentation masks, increasing annotation costs and limiting dataset expansion. We break this dependency and introduce location-free scene graph generation (LF-SGG) This new task aims at predicting instances of entities, as well as their relationships, without the explicit calculation of their spatial localization.
arXiv Detail & Related papers (2023-03-20T08:57:45Z)
Exploiting Scene Graphs for Human-Object Interaction Detection [81.49184987430333]
Human-Object Interaction (HOI) detection is a fundamental visual task aiming at localizing and recognizing interactions between humans and objects. We propose a novel method to exploit this information, through the scene graph, for the Human-Object Interaction (SG2HOI) detection task. Our method, SG2HOI, incorporates the SG information in two ways: (1) we embed a scene graph into a global context clue, serving as the scene-specific environmental context; and (2) we build a relation-aware message-passing module to gather relationships from objects' neighborhood and transfer them into interactions.
arXiv Detail & Related papers (2021-08-19T09:40:50Z)
Vision-Language Navigation with Random Environmental Mixup [112.94609558723518]
Vision-language Navigation (VLN) tasks require an agent to navigate step-by-step while perceiving the visual observations and comprehending a natural language instruction. Previous works have proposed various data augmentation methods to reduce data bias. We propose the Random Environmental Mixup (REM) method, which generates cross-connected house scenes as augmented data via mixuping environment.
arXiv Detail & Related papers (2021-06-15T04:34:26Z)
PEARL: Parallelized Expert-Assisted Reinforcement Learning for Scene Rearrangement Planning [28.9887381071402]
We propose a fine-grained action definition for Scene Rearrangement Planning (SRP) and introduce a large-scale scene rearrangement dataset. We also propose a novel learning paradigm to efficiently train an agent through self-playing, without any prior knowledge.
arXiv Detail & Related papers (2021-05-10T03:27:16Z)
SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots. Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step. This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.