Related papers: OpenObject-NAV: Open-Vocabulary Object-Oriented Navigation Based on Dynamic Carrier-Relationship Scene Graph

OpenObject-NAV: Open-Vocabulary Object-Oriented Navigation Based on Dynamic Carrier-Relationship Scene Graph

URL: http://arxiv.org/abs/2409.18743v1
Date: Fri, 27 Sep 2024 13:33:52 GMT
Title: OpenObject-NAV: Open-Vocabulary Object-Oriented Navigation Based on Dynamic Carrier-Relationship Scene Graph
Authors: Yujie Tang, Meiling Wang, Yinan Deng, Zibo Zheng, Jiagui Zhong, Yufeng Yue,
Abstract summary: This paper captures the relationships between frequently used objects and their static carriers. We propose an instance navigation strategy that models the navigation process as a Markov Decision Process. The results demonstrate that by updating the CRSG, the robot can efficiently navigate to moved targets.
Score: 10.475404599532157
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In everyday life, frequently used objects like cups often have unfixed positions and multiple instances within the same category, and their carriers frequently change as well. As a result, it becomes challenging for a robot to efficiently navigate to a specific instance. To tackle this challenge, the robot must capture and update scene changes and plans continuously. However, current object navigation approaches primarily focus on semantic-level and lack the ability to dynamically update scene representation. This paper captures the relationships between frequently used objects and their static carriers. It constructs an open-vocabulary Carrier-Relationship Scene Graph (CRSG) and updates the carrying status during robot navigation to reflect the dynamic changes of the scene. Based on the CRSG, we further propose an instance navigation strategy that models the navigation process as a Markov Decision Process. At each step, decisions are informed by Large Language Model's commonsense knowledge and visual-language feature similarity. We designed a series of long-sequence navigation tasks for frequently used everyday items in the Habitat simulator. The results demonstrate that by updating the CRSG, the robot can efficiently navigate to moved targets. Additionally, we deployed our algorithm on a real robot and validated its practical effectiveness.

Related papers

Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation [12.561993540768729]
We present LE-Nav, an interpretable and scene-aware navigation framework for service robots.<n>To achieve zero-shot scene understanding, we utilize one-shot exemplars and chain-of-thought prompting strategies.<n>Experiments show that LE-Nav can generate hyperparameters achieving human-level tuning across diverse planners and scenarios.
arXiv Detail & Related papers (2025-07-15T05:37:24Z)
General-Purpose Robotic Navigation via LVLM-Orchestrated Perception, Reasoning, and Acting [9.157222032441531]
Agentic Robotic Navigation Architecture (ARNA) is a general-purpose navigation framework that equips an LVLM-based agent with a library of perception, reasoning, and navigation tools.<n>At runtime, the agent autonomously defines and executes task-specific navigation that iteratively query the robotic modules, reason over multimodal inputs, and select appropriate navigation actions.<n>ARNA achieves state-of-the-art performance, demonstrating effective exploration, navigation, and embodied question answering without relying on handcrafted plans, fixed input representations, or pre-existing maps.
arXiv Detail & Related papers (2025-06-20T20:06:14Z)
Ground-level Viewpoint Vision-and-Language Navigation in Continuous Environments [10.953629652228024]
Vision-and-Language Navigation (VLN) agents associate time-sequenced visual observations with corresponding instructions to make decisions. In this paper, we address the mismatch between human-centric instructions and quadruped robots with a low-height field of view. We propose a Ground-level Viewpoint Navigation (GVNav) approach to mitigate this issue.
arXiv Detail & Related papers (2025-02-26T10:30:40Z)
Time is on my sight: scene graph filtering for dynamic environment perception in an LLM-driven robot [0.8515309662618664]
This paper presents a robot control architecture that addresses key challenges in human-robot interaction. The architecture uses Large Language Models to integrate diverse information sources, including natural language commands. The architecture enhances adaptability, task efficiency, and human-robot collaboration in dynamic environments.
arXiv Detail & Related papers (2024-11-22T15:58:26Z)
HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments [8.974071308749007]
We study the problem of robot navigation in dense and interactive crowds with environmental constraints such as corridors and furniture. Previous methods fail to consider all types of interactions among agents and obstacles, leading to unsafe and inefficient robot paths. We propose a structured framework to learn robot navigation policies with reinforcement learning.
arXiv Detail & Related papers (2024-11-19T00:56:35Z)
DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control [53.80518003412016]
Building a general-purpose intelligent home-assistant agent skilled in diverse tasks by human commands is a long-term blueprint of embodied AI research. We study primitive mobile manipulations for embodied agents, i.e. how to navigate and interact based on an instructed verb-noun pair. We propose DISCO, which features non-trivial advancements in contextualized scene modeling and efficient controls.
arXiv Detail & Related papers (2024-07-20T05:39:28Z)
GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation [65.71524410114797]
GOAT-Bench is a benchmark for the universal navigation task GO to AnyThing (GOAT) In GOAT, the agent is directed to navigate to a sequence of targets specified by the category name, language description, or image. We benchmark monolithic RL and modular methods on the GOAT task, analyzing their performance across modalities.
arXiv Detail & Related papers (2024-04-09T20:40:00Z)
Right Place, Right Time! Dynamizing Topological Graphs for Embodied Navigation [55.581423861790945]
Embodied Navigation tasks often involve constructing topological graphs of a scene during exploration. We introduce structured object transitions to dynamize static topological graphs called Object Transition Graphs (OTGs) OTGs simulate portable targets following structured routes inspired by human habits.
arXiv Detail & Related papers (2024-03-14T22:33:22Z)
Interactive Semantic Map Representation for Skill-based Visual Object Navigation [43.71312386938849]
This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment. We have implemented this representation into a full-fledged navigation approach called SkillTron. The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation.
arXiv Detail & Related papers (2023-11-07T16:30:12Z)
Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation Using Vision Language Models [8.668211481067457]
Co-NavGPT is a novel framework that integrates a Vision Language Model (VLM) as a global planner.<n>Co-NavGPT aggregates sub-maps from multiple robots with diverse viewpoints into a unified global map.<n>The VLM uses this information to assign frontiers across the robots, facilitating coordinated and efficient exploration.
arXiv Detail & Related papers (2023-10-11T23:17:43Z)
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration [57.15811390835294]
This paper describes how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration. We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments. Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods.
arXiv Detail & Related papers (2023-10-11T21:07:14Z)
SG-Bot: Object Rearrangement via Coarse-to-Fine Robotic Imagination on Scene Graphs [81.15889805560333]
We present SG-Bot, a novel rearrangement framework. SG-Bot exemplifies lightweight, real-time, and user-controllable characteristics. Experimental results demonstrate that SG-Bot outperforms competitors by a large margin.
arXiv Detail & Related papers (2023-09-21T15:54:33Z)
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action [76.71101507291473]
We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories. We show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data.
arXiv Detail & Related papers (2022-07-10T10:41:50Z)
Pushing it out of the Way: Interactive Visual Navigation [62.296686176988125]
We study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals. We introduce the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent's actions. By modeling the changes while planning, we find that agents exhibit significant improvements in their navigational capabilities.
arXiv Detail & Related papers (2021-04-28T22:46:41Z)
Learning Synthetic to Real Transfer for Localization and Navigational Tasks [7.019683407682642]
Navigation is at the crossroad of multiple disciplines, it combines notions of computer vision, robotics and control. This work aimed at creating, in a simulation, a navigation pipeline whose transfer to the real world could be done with as few efforts as possible. To design the navigation pipeline four main challenges arise; environment, localization, navigation and planning.
arXiv Detail & Related papers (2020-11-20T08:37:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.