Related papers: Time is on my sight: scene graph filtering for dynamic environment perception in an LLM-driven robot

Time is on my sight: scene graph filtering for dynamic environment perception in an LLM-driven robot

URL: http://arxiv.org/abs/2411.15027v1
Date: Fri, 22 Nov 2024 15:58:26 GMT
Title: Time is on my sight: scene graph filtering for dynamic environment perception in an LLM-driven robot
Authors: Simone Colombani, Luca Brini, Dimitri Ognibene, Giuseppe Boccignone,
Abstract summary: This paper presents a robot control architecture that addresses key challenges in human-robot interaction. The architecture uses Large Language Models to integrate diverse information sources, including natural language commands. The architecture enhances adaptability, task efficiency, and human-robot collaboration in dynamic environments.
Score: 0.8515309662618664
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Robots are increasingly being used in dynamic environments like workplaces, hospitals, and homes. As a result, interactions with robots must be simple and intuitive, with robots perception adapting efficiently to human-induced changes. This paper presents a robot control architecture that addresses key challenges in human-robot interaction, with a particular focus on the dynamic creation and continuous update of the robot state representation. The architecture uses Large Language Models to integrate diverse information sources, including natural language commands, robotic skills representation, real-time dynamic semantic mapping of the perceived scene. This enables flexible and adaptive robotic behavior in complex, dynamic environments. Traditional robotic systems often rely on static, pre-programmed instructions and settings, limiting their adaptability to dynamic environments and real-time collaboration. In contrast, this architecture uses LLMs to interpret complex, high-level instructions and generate actionable plans that enhance human-robot collaboration. At its core, the system Perception Module generates and continuously updates a semantic scene graph using RGB-D sensor data, providing a detailed and structured representation of the environment. A particle filter is employed to ensure accurate object localization in dynamic, real-world settings. The Planner Module leverages this up-to-date semantic map to break down high-level tasks into sub-tasks and link them to robotic skills such as navigation, object manipulation (e.g., PICK and PLACE), and movement (e.g., GOTO). By combining real-time perception, state tracking, and LLM-driven communication and task planning, the architecture enhances adaptability, task efficiency, and human-robot collaboration in dynamic environments.

Related papers

Hi-Dyna Graph: Hierarchical Dynamic Scene Graph for Robotic Autonomy in Human-Centric Environments [41.80879866951797]
Hi-Dyna Graph is a hierarchical dynamic scene graph architecture that integrates persistent global layouts with localized dynamic semantics for embodied robotic autonomy.<n>An agent powered by large language models (LLMs) is employed to interpret the unified graph, infer latent task triggers, and generate executable instructions grounded in robotic affordances.
arXiv Detail & Related papers (2025-05-30T03:35:29Z)
One to rule them all: natural language to bind communication, perception and action [0.9302364070735682]
This paper presents an advanced architecture for robotic action planning that integrates communication, perception, and planning with Large Language Models (LLMs) The Planner Module is the core of the system where LLMs embedded in a modified ReAct framework are employed to interpret and carry out user commands. The modified ReAct framework further enhances the execution space by providing real-time environmental perception and the outcomes of physical actions.
arXiv Detail & Related papers (2024-11-22T16:05:54Z)
GRAPPA: Generalizing and Adapting Robot Policies via Online Agentic Guidance [15.774237279917594]
We propose an agentic framework for robot self-guidance and self-improvement. Our framework iteratively grounds a base robot policy to relevant objects in the environment. We demonstrate that our approach can effectively guide manipulation policies to achieve significantly higher success rates.
arXiv Detail & Related papers (2024-10-09T02:00:37Z)
Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models [53.22792173053473]
We introduce an interactive robotic manipulation framework called Polaris. Polaris integrates perception and interaction by utilizing GPT-4 alongside grounded vision models. We propose a novel Synthetic-to-Real (Syn2Real) pose estimation pipeline.
arXiv Detail & Related papers (2024-08-15T06:40:38Z)
Flow as the Cross-Domain Manipulation Interface [73.15952395641136]
Im2Flow2Act enables robots to acquire real-world manipulation skills without the need of real-world robot training data. Im2Flow2Act comprises two components: a flow generation network and a flow-conditioned policy. We demonstrate Im2Flow2Act's capabilities in a variety of real-world tasks, including the manipulation of rigid, articulated, and deformable objects.
arXiv Detail & Related papers (2024-07-21T16:15:02Z)
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation. We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language. We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z)
InCoRo: In-Context Learning for Robotics Control with Feedback Loops [4.702566749969133]
InCoRo is a system that uses a classical robotic feedback loop composed of an LLM controller, a scene understanding unit, and a robot. We highlight the generalization capabilities of our system and show that InCoRo surpasses the prior art in terms of the success rate. This research paves the way towards building reliable, efficient, intelligent autonomous systems that adapt to dynamic environments.
arXiv Detail & Related papers (2024-02-07T19:01:11Z)
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z)
Prompt a Robot to Walk with Large Language Models [18.214609570837403]
Large language models (LLMs) pre-trained on vast internet-scale data have showcased remarkable capabilities across diverse domains. We introduce a novel paradigm in which we use few-shot prompts collected from the physical environment. Experiments across various robots and environments validate that our method can effectively prompt a robot to walk.
arXiv Detail & Related papers (2023-09-18T17:50:17Z)
SEAL: Semantic Frame Execution And Localization for Perceiving Afforded Robot Actions [5.522839151632667]
We extend the semantic frame representation for robot manipulation actions and introduce the problem of Semantic Frame Execution And Localization for Perceiving Afforded Robot Actions (SEAL) as a graphical model. For the SEAL problem, we describe our nonparametric Semantic Frame Mapping (SeFM) algorithm for maintaining belief over a finite set of semantic frames as the locations of actions afforded to the robot.
arXiv Detail & Related papers (2023-03-24T15:25:41Z)
Synthesis and Execution of Communicative Robotic Movements with Generative Adversarial Networks [59.098560311521034]
We focus on how to transfer on two different robotic platforms the same kinematics modulation that humans adopt when manipulating delicate objects. We choose to modulate the velocity profile adopted by the robots' end-effector, inspired by what humans do when transporting objects with different characteristics. We exploit a novel Generative Adversarial Network architecture, trained with human kinematics examples, to generalize over them and generate new and meaningful velocity profiles.
arXiv Detail & Related papers (2022-03-29T15:03:05Z)
HARPS: An Online POMDP Framework for Human-Assisted Robotic Planning and Sensing [1.3678064890824186]
The Human Assisted Robotic Planning and Sensing (HARPS) framework is presented for active semantic sensing and planning in human-robot teams. This approach lets humans opportunistically impose model structure and extend the range of semantic soft data in uncertain environments. Simulations of a UAV-enabled target search application in a large-scale partially structured environment show significant improvements in time and belief state estimates.
arXiv Detail & Related papers (2021-10-20T00:41:57Z)
SAPIEN: A SimulAted Part-based Interactive ENvironment [77.4739790629284]
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set for articulated objects. We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks.
arXiv Detail & Related papers (2020-03-19T00:11:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.