Related papers: Hi-Dyna Graph: Hierarchical Dynamic Scene Graph for Robotic Autonomy in Human-Centric Environments

Hi-Dyna Graph: Hierarchical Dynamic Scene Graph for Robotic Autonomy in Human-Centric Environments

URL: http://arxiv.org/abs/2506.00083v1
Date: Fri, 30 May 2025 03:35:29 GMT
Title: Hi-Dyna Graph: Hierarchical Dynamic Scene Graph for Robotic Autonomy in Human-Centric Environments
Authors: Jiawei Hou, Xiangyang Xue, Taiping Zeng,
Abstract summary: Hi-Dyna Graph is a hierarchical dynamic scene graph architecture that integrates persistent global layouts with localized dynamic semantics for embodied robotic autonomy.<n>An agent powered by large language models (LLMs) is employed to interpret the unified graph, infer latent task triggers, and generate executable instructions grounded in robotic affordances.
Score: 41.80879866951797
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autonomous operation of service robotics in human-centric scenes remains challenging due to the need for understanding of changing environments and context-aware decision-making. While existing approaches like topological maps offer efficient spatial priors, they fail to model transient object relationships, whereas dense neural representations (e.g., NeRF) incur prohibitive computational costs. Inspired by the hierarchical scene representation and video scene graph generation works, we propose Hi-Dyna Graph, a hierarchical dynamic scene graph architecture that integrates persistent global layouts with localized dynamic semantics for embodied robotic autonomy. Our framework constructs a global topological graph from posed RGB-D inputs, encoding room-scale connectivity and large static objects (e.g., furniture), while environmental and egocentric cameras populate dynamic subgraphs with object position relations and human-object interaction patterns. A hybrid architecture is conducted by anchoring these subgraphs to the global topology using semantic and spatial constraints, enabling seamless updates as the environment evolves. An agent powered by large language models (LLMs) is employed to interpret the unified graph, infer latent task triggers, and generate executable instructions grounded in robotic affordances. We conduct complex experiments to demonstrate Hi-Dyna Grap's superior scene representation effectiveness. Real-world deployments validate the system's practicality with a mobile manipulator: robotics autonomously complete complex tasks with no further training or complex rewarding in a dynamic scene as cafeteria assistant. See https://anonymous.4open.science/r/Hi-Dyna-Graph-B326 for video demonstration and more details.

Related papers

DyGEnc: Encoding a Sequence of Textual Scene Graphs to Reason and Answer Questions in Dynamic Scenes [0.0]
We introduce DyGEnc - a novel method for.<n>a Dynamic Graph.<n>This method integrates compressed spatial-temporal structural observation with the cognitive capabilities of large language models.<n>DyGEnc outperforms existing visual methods by a large margin of 15-25% in addressing queries regarding the history of human-to-object interactions.
arXiv Detail & Related papers (2025-05-06T14:41:42Z)
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation [55.26713167507132]
We introduce a generative robotics foundation model that constructs and interprets embodied spaces.<n>EnerVerse employs an autoregressive video diffusion framework to predict future embodied spaces from instructions, enhanced by a sparse context memory for long-term reasoning.<n>We present EnerVerse-D, a data engine pipeline combining the generative model with 4D Gaussian Splatting, forming a self-reinforcing data loop to reduce the sim-to-real gap.
arXiv Detail & Related papers (2025-01-03T17:00:33Z)
Time is on my sight: scene graph filtering for dynamic environment perception in an LLM-driven robot [0.8515309662618664]
This paper presents a robot control architecture that addresses key challenges in human-robot interaction. The architecture uses Large Language Models to integrate diverse information sources, including natural language commands. The architecture enhances adaptability, task efficiency, and human-robot collaboration in dynamic environments.
arXiv Detail & Related papers (2024-11-22T15:58:26Z)
Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models [53.22792173053473]
We introduce an interactive robotic manipulation framework called Polaris. Polaris integrates perception and interaction by utilizing GPT-4 alongside grounded vision models. We propose a novel Synthetic-to-Real (Syn2Real) pose estimation pipeline.
arXiv Detail & Related papers (2024-08-15T06:40:38Z)
GRUtopia: Dream General Robots in a City at Scale [65.08318324604116]
This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots. GRScenes includes 100k interactive, finely annotated scenes, which can be freely combined into city-scale environments. GRResidents is a Large Language Model (LLM) driven Non-Player Character (NPC) system that is responsible for social interaction.
arXiv Detail & Related papers (2024-07-15T17:40:46Z)
RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation [21.387160107315797]
We introduce the novel task of interactive scene exploration, wherein robots autonomously explore environments and produce an action-conditioned scene graph (ACSG) The ACSG accounts for both low-level information (geometry and semantics) and high-level information (action-conditioned relationships between different entities) in the scene. We present the Robotic Exploration (RoboEXP) system, which incorporates the Large Multimodal Model (LMM) and an explicit memory design to enhance our system's capabilities.
arXiv Detail & Related papers (2024-02-23T18:27:17Z)
Situational Graphs for Robot Navigation in Structured Indoor Environments [9.13466172688693]
We present a real-time online built Situational Graphs (S-Graphs) composed of a single graph representing the environment. Our method utilizes odometry readings and planar surfaces extracted from 3D LiDAR scans, to construct and optimize in real-time a three layered S-Graph. Our proposal does not only demonstrate state-of-the-art results for pose estimation of the robot, but also contributes with a metric-semantic-topological model of the environment.
arXiv Detail & Related papers (2022-02-24T16:59:06Z)
OG-SGG: Ontology-Guided Scene Graph Generation. A Case Study in Transfer Learning for Telepresence Robotics [124.08684545010664]
Scene graph generation from images is a task of great interest to applications such as robotics. We propose an initial approximation to a framework called Ontology-Guided Scene Graph Generation (OG-SGG)
arXiv Detail & Related papers (2022-02-21T13:23:15Z)
iGibson, a Simulation Environment for Interactive Tasks in Large Realistic Scenes [54.04456391489063]
iGibson is a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes. Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects. iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors.
arXiv Detail & Related papers (2020-12-05T02:14:17Z)
3D Dynamic Scene Graphs: Actionable Spatial Perception with Places, Objects, and Humans [27.747241700017728]
We present a unified representation for actionable spatial perception: 3D Dynamic Scene Graphs. 3D Dynamic Scene Graphs can have a profound impact on planning and decision-making, human-robot interaction, long-term autonomy, and scene prediction.
arXiv Detail & Related papers (2020-02-15T00:46:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.