Related papers: Unifying Large Language Model and Deep Reinforcement Learning for Human-in-Loop Interactive Socially-aware Navigation

Unifying Large Language Model and Deep Reinforcement Learning for Human-in-Loop Interactive Socially-aware Navigation

URL: http://arxiv.org/abs/2403.15648v3
Date: Fri, 07 Mar 2025 20:03:06 GMT
Title: Unifying Large Language Model and Deep Reinforcement Learning for Human-in-Loop Interactive Socially-aware Navigation
Authors: Weizheng Wang, Ike Obi, Aniket Bera, Byung-Cheol Min,
Abstract summary: Social robot navigation planners face two major challenges: managing real-time user inputs and ensuring socially compliant behaviors.<n>We introduce SALM, an interactive, human-in-loop Socially-Aware navigation Large Language Model framework.<n>A memory mechanism archives temporal data for continuous refinement, while a multi-step graph-of-thoughts inference-based large language feedback model adaptively fuses the strengths of both planning approaches.
Score: 16.789333617628138
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Navigating human-filled spaces is crucial for the interactive social robots to support advanced services, such as cooperative carrying, which enables service provision in complex and crowded environments while adapting behavior based on real-time human language commands or feedback. However, existing social robot navigation planners face two major challenges: managing real-time user inputs and ensuring socially compliant behaviors in unfamiliar, zero-shot environments. In response, we introduce SALM, an interactive, human-in-loop Socially-Aware navigation Large Language Model framework that dynamically integrates deep reinforcement learning (DRL) with large language model (LLM) capabilities. SALM leverages contextual semantic understanding from real-time human-robot interactions to convert high-level user commands into precise, low-level control actions. A high-level LLM module parses user input, guiding the simultaneous generation of navigation commands by both a large language navigation model (LNM) and a DRL-based navigation model (RLNM). A memory mechanism archives temporal data for continuous refinement, while a multi-step graph-of-thoughts inference-based large language feedback model adaptively fuses the strengths of both planning approaches. Experimental evaluations demonstrate that SALM not only enhances navigational precision in crowded, dynamic environments but also significantly improves system adaptability, offering tailored behaviors that align with individual user preferences and real-time feedback. More details and videos about this work are available at: https://sites.google.com/view/navi-salm.

Related papers

From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning [59.88543114325153]
We introduce the Seeing-to-Experiencing framework to scale the capability of navigation foundation models with reinforcement learning.<n>S2E combines the strengths of pre-training on videos and post-training through RL.<n>We establish a comprehensive end-to-end evaluation benchmark, NavBench-GS, built on photorealistic 3DGS reconstructions of real-world scenes.
arXiv Detail & Related papers (2025-07-29T17:26:10Z)
Learning to Tune Like an Expert: Interpretable and Scene-Aware Navigation via MLLM Reasoning and CVAE-Based Adaptation [12.561993540768729]
We present LE-Nav, an interpretable and scene-aware navigation framework for service robots.<n>To achieve zero-shot scene understanding, we utilize one-shot exemplars and chain-of-thought prompting strategies.<n>Experiments show that LE-Nav can generate hyperparameters achieving human-level tuning across diverse planners and scenarios.
arXiv Detail & Related papers (2025-07-15T05:37:24Z)
HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard [63.54109142085327]
Vision-and-Language Navigation (VLN) systems often focus on either discrete (panoramic) or continuous (free-motion) paradigms alone. We introduce a unified Human-Aware VLN benchmark that merges these paradigms under explicit social-awareness constraints.
arXiv Detail & Related papers (2025-03-18T13:05:55Z)
Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues [54.81155589931697]
Collaborative Instance object Navigation (CoIN) is a new task setting where the agent actively resolve uncertainties about the target instance. We propose a novel training-free method, Agent-user Interaction with UncerTainty Awareness (AIUTA) First, upon object detection, a Self-Questioner model initiates a self-dialogue within the agent to obtain a complete and accurate observation description. An Interaction Trigger module determines whether to ask a question to the human, continue or halt navigation.
arXiv Detail & Related papers (2024-12-02T08:16:38Z)
GSON: A Group-based Social Navigation Framework with Large Multimodal Model [9.94576166903495]
This paper introduces GSON, a novel group-based social navigation framework. GSON uses visual prompting to enable zero-shot extraction of social relationships among pedestrians. We validate GSON through extensive real-world mobile robot navigation experiments.
arXiv Detail & Related papers (2024-09-26T17:27:15Z)
Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue [25.89926022671521]
We generate a large-scale dataset of 100,000 paired LLM-LLM and human-LLM dialogues from the WildChat dataset. We find relatively low alignment between simulations and human interactions, demonstrating a systematic divergence along the multiple textual properties.
arXiv Detail & Related papers (2024-09-12T18:00:18Z)
Sparse Rewards Can Self-Train Dialogue Agents [22.799506097310008]
We introduce a novel self-improvement paradigm that empowers LLM agents to autonomously enhance their performance without external human feedback. We present ToolWOZ, a sparse reward tool-calling simulation environment derived from MultiWOZ. We demonstrate that models trained with JOSH, both small and frontier, significantly improve tool-based interactions while preserving general model capabilities across diverse benchmarks.
arXiv Detail & Related papers (2024-09-06T21:00:57Z)
Large Language Models for Base Station Siting: Intelligent Deployment based on Prompt or Agent [62.16747639440893]
Large language models (LLMs) and their associated technologies advance, particularly in the realms of prompt engineering and agent engineering. This approach entails the strategic use of well-crafted prompts to infuse human experience and knowledge into these sophisticated LLMs. This integration represents the future paradigm of artificial intelligence (AI) as a service and AI for more ease.
arXiv Detail & Related papers (2024-08-07T08:43:32Z)
Enhancing the LLM-Based Robot Manipulation Through Human-Robot Collaboration [4.2460673279562755]
Large Language Models (LLMs) are gaining popularity in the field of robotics. This paper proposes a novel approach to enhance the performance of LLM-based autonomous manipulation through Human-Robot Collaboration (HRC) The approach involves using a prompted GPT-4 language model to decompose high-level language commands into sequences of motions that can be executed by the robot.
arXiv Detail & Related papers (2024-06-20T08:23:49Z)
Hello Again! LLM-powered Personalized Agent for Long-term Dialogue [63.65128176360345]
We introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent) It incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation. The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated.
arXiv Detail & Related papers (2024-06-09T21:58:32Z)
ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback [86.87638927637005]
ChatGLM is a free-to-use AI service powered by large language models (LLMs) We present the ChatGLM-RLHF pipeline, designed to enhance ChatGLM's alignment with human preferences.
arXiv Detail & Related papers (2024-04-01T05:39:36Z)
ST-LLM: Large Language Models Are Effective Temporal Learners [58.79456373423189]
Large Language Models (LLMs) have showcased impressive capabilities in text comprehension and generation. How to effectively encode and understand videos in video-based dialogue systems remains to be solved. We propose ST-LLM, an effective video-LLM baseline with spatial-temporal sequence modeling inside LLM.
arXiv Detail & Related papers (2024-03-30T10:11:26Z)
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity [56.30595787061546]
We focus on solving one of the most important tasks in the field of speech processing, with speech foundation encoders and large language models (LLM) Recent works have complex designs such as compressing the output temporally for the speech encoder, tackling modal alignment for the projector, and utilizing parameter-efficient fine-tuning for the LLM. We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task.
arXiv Detail & Related papers (2024-02-13T23:25:04Z)
Multi-Agent Dynamic Relational Reasoning for Social Robot Navigation [50.01551945190676]
Social robot navigation can be helpful in various contexts of daily life but requires safe human-robot interactions and efficient trajectory planning. We propose a systematic relational reasoning approach with explicit inference of the underlying dynamically evolving relational structures. We demonstrate its effectiveness for multi-agent trajectory prediction and social robot navigation.
arXiv Detail & Related papers (2024-01-22T18:58:22Z)
LLM A*: Human in the Loop Large Language Models Enabled A* Search for Robotics [3.567107449359775]
This research focuses on how Large Language Models (LLMs) can help with (path) planning for mobile embodied agents such as robots. A novel framework named LLM A*, aims to leverage the commonsense of LLMs, and the utility-optimal A* is proposed to facilitate few-shot near-optimal path planning. This approach takes human feedback on board and renders the entire planning process transparent (akin to a white box') to humans.
arXiv Detail & Related papers (2023-12-04T10:37:58Z)
Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation [17.279875204729553]
Zero-Shot Object Navigation (ZSON) enables agents to navigate towards open-vocabulary objects in unknown environments. We introduce ZIPON, where robots need to navigate to personalized goal objects while engaging in conversations with users. We propose Open-woRld Interactive persOnalized Navigation (ORION) to make sequential decisions to manipulate different modules for perception, navigation and communication.
arXiv Detail & Related papers (2023-10-12T01:17:56Z)
User Simulation with Large Language Models for Evaluating Task-Oriented Dialogue [10.336443286833145]
We propose a novel user simulator built using recently developed large pretrained language models (LLMs) Unlike previous work, which sought to maximize goal success rate (GSR) as the primary metric of simulator performance, our goal is a system which achieves a GSR similar to that observed in human interactions with TOD systems.
arXiv Detail & Related papers (2023-09-23T02:04:57Z)
ChatGPT as your Personal Data Scientist [0.9689893038619583]
This paper introduces a ChatGPT-based conversational data-science framework to act as a "personal data scientist" Our model pivots around four dialogue states: Data visualization, Task Formulation, Prediction Engineering, and Result Summary and Recommendation. In summary, we developed an end-to-end system that not only proves the viability of the novel concept of conversational data science but also underscores the potency of LLMs in solving complex tasks.
arXiv Detail & Related papers (2023-05-23T04:00:16Z)
SocNavGym: A Reinforcement Learning Gym for Social Navigation [0.0]
SocNavGym is an advanced simulation environment for social navigation. It can generate different types of social navigation scenarios. It can also be configured to work with different hand-crafted and data-driven social reward signals.
arXiv Detail & Related papers (2023-04-27T11:29:02Z)
Low-code LLM: Graphical User Interface over Large Language Models [115.08718239772107]
This paper introduces a novel human-LLM interaction framework, Low-code LLM. It incorporates six types of simple low-code visual programming interactions to achieve more controllable and stable responses. We highlight three advantages of the low-code LLM: user-friendly interaction, controllable generation, and wide applicability.
arXiv Detail & Related papers (2023-04-17T09:27:40Z)
Can Large Language Models Transform Computational Social Science? [79.62471267510963]
Large Language Models (LLMs) are capable of performing many language processing tasks zero-shot (without training data) This work provides a road map for using LLMs as Computational Social Science tools.
arXiv Detail & Related papers (2023-04-12T17:33:28Z)
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action [76.71101507291473]
We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories. We show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data.
arXiv Detail & Related papers (2022-07-10T10:41:50Z)
Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration [83.96729205383501]
We introduce prompt-based learning to achieve fast adaptation for language embeddings. Our model can adapt to diverse vision-language navigation tasks, including VLN and REVERIE.
arXiv Detail & Related papers (2022-03-08T11:01:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.