Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces
- URL: http://arxiv.org/abs/2501.09024v1
- Date: Mon, 30 Dec 2024 23:59:30 GMT
- Title: Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces
- Authors: Amirreza Payandeh, Daeun Song, Mohammad Nazeri, Jing Liang, Praneel Mukherjee, Amir Hossain Raj, Yangzhe Kong, Dinesh Manocha, Xuesu Xiao,
- Abstract summary: We propose using language to bridge the gap in human-like reasoning between perception and robot actions.<n>We create a vision-language dataset, Social robot Navigation via Explainable Interactions (SNEI), featuring 40K human-annotated Visual Question Answers (VQAs)<n>We fine-tune a VLM, Social-LLaVA, using SNEI to demonstrate the practical application of our dataset.
- Score: 40.44502415484082
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most existing social robot navigation techniques either leverage hand-crafted rules or human demonstrations to connect robot perception to socially compliant actions. However, there remains a significant gap in effectively translating perception into socially compliant actions, much like how human reasoning naturally occurs in dynamic environments. Considering the recent success of Vision-Language Models (VLMs), we propose using language to bridge the gap in human-like reasoning between perception and socially aware robot actions. We create a vision-language dataset, Social robot Navigation via Explainable Interactions (SNEI), featuring 40K human-annotated Visual Question Answers (VQAs) based on 2K human-robot social interactions in unstructured, crowded public spaces, spanning perception, prediction, chain-of-thought reasoning, action, and explanation. We fine-tune a VLM, Social-LLaVA, using SNEI to demonstrate the practical application of our dataset. Social-LLaVA outperforms state-of-the-art models like GPT-4V and Gemini, based on the average of fifteen different human-judge scores across 50 VQA. Deployed onboard a mobile robot, Social-LLaVA enables human-like reasoning, marking a promising step toward socially compliant robot navigation in dynamic public spaces through language reasoning.
Related papers
- The Human Robot Social Interaction (HSRI) Dataset: Benchmarking Foundational Models' Social Reasoning [49.32390524168273]
Our work aims to advance the social reasoning of embodied artificial intelligence (AI) agents in real-world social interactions.
We introduce a large-scale real-world Human Robot Social Interaction (HSRI) dataset to benchmark the capabilities of language models (LMs) and foundational models (FMs)
Our dataset consists of 400 real-world human social robot interaction videos and over 10K annotations, detailing the robot's social errors, competencies, rationale, and corrective actions.
arXiv Detail & Related papers (2025-04-07T06:27:02Z) - Socially Integrated Navigation: A Social Acting Robot with Deep Reinforcement Learning [0.7864304771129751]
Mobile robots are being used on a large scale in various crowded situations and become part of our society.
Socially acceptable navigation behavior of a mobile robot with individual human consideration is an essential requirement for scalable applications and human acceptance.
We propose a novel socially integrated navigation approach where the robot's social behavior is adaptive and emerges from the interaction with humans.
arXiv Detail & Related papers (2024-03-14T18:25:40Z) - Robot Interaction Behavior Generation based on Social Motion Forecasting for Human-Robot Interaction [9.806227900768926]
We propose to model social motion forecasting in a shared human-robot representation space.
ECHO operates in the aforementioned shared space to predict the future motions of the agents encountered in social scenarios.
We evaluate our model in multi-person and human-robot motion forecasting tasks and obtain state-of-the-art performance by a large margin.
arXiv Detail & Related papers (2024-02-07T11:37:14Z) - HandMeThat: Human-Robot Communication in Physical and Social
Environments [73.91355172754717]
HandMeThat is a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments.
HandMeThat contains 10,000 episodes of human-robot interactions.
We show that both offline and online reinforcement learning algorithms perform poorly on HandMeThat.
arXiv Detail & Related papers (2023-10-05T16:14:46Z) - Developing Social Robots with Empathetic Non-Verbal Cues Using Large
Language Models [2.5489046505746704]
We design and label four types of empathetic non-verbal cues, abbreviated as SAFE: Speech, Action (gesture), Facial expression, and Emotion, in a social robot.
Preliminary results show distinct patterns in the robot's responses, such as a preference for calm and positive social emotions like 'joy' and 'lively', and frequent nodding gestures.
Our work lays the groundwork for future studies on human-robot interactions, emphasizing the essential role of both verbal and non-verbal cues in creating social and empathetic robots.
arXiv Detail & Related papers (2023-08-31T08:20:04Z) - Principles and Guidelines for Evaluating Social Robot Navigation
Algorithms [44.51586279645062]
Social robot navigation is difficult to evaluate because it involves dynamic human agents and their perceptions of the appropriateness of robot behavior.
Our contributions include (a) a definition of a socially navigating robot as one that respects the principles of safety, comfort, legibility, politeness, social competency, agent understanding, proactivity, and responsiveness to context, (b) guidelines for the use of metrics, development of scenarios, benchmarks, datasets, and simulators to evaluate social navigation, and (c) a social navigation metrics framework to make it easier to compare results from different simulators, robots and datasets.
arXiv Detail & Related papers (2023-06-29T07:31:43Z) - SACSoN: Scalable Autonomous Control for Social Navigation [62.59274275261392]
We develop methods for training policies for socially unobtrusive navigation.
By minimizing this counterfactual perturbation, we can induce robots to behave in ways that do not alter the natural behavior of humans in the shared space.
We collect a large dataset where an indoor mobile robot interacts with human bystanders.
arXiv Detail & Related papers (2023-06-02T19:07:52Z) - Gesture2Path: Imitation Learning for Gesture-aware Navigation [54.570943577423094]
We present Gesture2Path, a novel social navigation approach that combines image-based imitation learning with model-predictive control.
We deploy our method on real robots and showcase the effectiveness of our approach for the four gestures-navigation scenarios.
arXiv Detail & Related papers (2022-09-19T23:05:36Z) - Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of
Demonstrations for Social Navigation [92.66286342108934]
Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a'socially compliant' manner in the presence of other intelligent agents such as humans.
Our dataset contains 8.7 hours, 138 trajectories, 25 miles of socially compliant, human teleoperated driving demonstrations.
arXiv Detail & Related papers (2022-03-28T19:09:11Z) - From Learning to Relearning: A Framework for Diminishing Bias in Social
Robot Navigation [3.3511723893430476]
We argue that social navigation models can replicate, promote, and amplify societal unfairness such as discrimination and segregation.
Our proposed framework consists of two components: textitlearning which incorporates social context into the learning process to account for safety and comfort, and textitrelearning to detect and correct potentially harmful outcomes before the onset.
arXiv Detail & Related papers (2021-01-07T17:42:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.