Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces
- URL: http://arxiv.org/abs/2501.09024v1
- Date: Mon, 30 Dec 2024 23:59:30 GMT
- Title: Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces
- Authors: Amirreza Payandeh, Daeun Song, Mohammad Nazeri, Jing Liang, Praneel Mukherjee, Amir Hossain Raj, Yangzhe Kong, Dinesh Manocha, Xuesu Xiao,
- Abstract summary: We propose using language to bridge the gap in human-like reasoning between perception and robot actions.
We create a vision-language dataset, Social robot Navigation via Explainable Interactions (SNEI), featuring 40K human-annotated Visual Question Answers (VQAs)
We fine-tune a VLM, Social-LLaVA, using SNEI to demonstrate the practical application of our dataset.
- Score: 40.44502415484082
- License:
- Abstract: Most existing social robot navigation techniques either leverage hand-crafted rules or human demonstrations to connect robot perception to socially compliant actions. However, there remains a significant gap in effectively translating perception into socially compliant actions, much like how human reasoning naturally occurs in dynamic environments. Considering the recent success of Vision-Language Models (VLMs), we propose using language to bridge the gap in human-like reasoning between perception and socially aware robot actions. We create a vision-language dataset, Social robot Navigation via Explainable Interactions (SNEI), featuring 40K human-annotated Visual Question Answers (VQAs) based on 2K human-robot social interactions in unstructured, crowded public spaces, spanning perception, prediction, chain-of-thought reasoning, action, and explanation. We fine-tune a VLM, Social-LLaVA, using SNEI to demonstrate the practical application of our dataset. Social-LLaVA outperforms state-of-the-art models like GPT-4V and Gemini, based on the average of fifteen different human-judge scores across 50 VQA. Deployed onboard a mobile robot, Social-LLaVA enables human-like reasoning, marking a promising step toward socially compliant robot navigation in dynamic public spaces through language reasoning.
Related papers
- Socially Integrated Navigation: A Social Acting Robot with Deep Reinforcement Learning [0.7864304771129751]
Mobile robots are being used on a large scale in various crowded situations and become part of our society.
Socially acceptable navigation behavior of a mobile robot with individual human consideration is an essential requirement for scalable applications and human acceptance.
We propose a novel socially integrated navigation approach where the robot's social behavior is adaptive and emerges from the interaction with humans.
arXiv Detail & Related papers (2024-03-14T18:25:40Z) - Robot Interaction Behavior Generation based on Social Motion Forecasting for Human-Robot Interaction [9.806227900768926]
We propose to model social motion forecasting in a shared human-robot representation space.
ECHO operates in the aforementioned shared space to predict the future motions of the agents encountered in social scenarios.
We evaluate our model in multi-person and human-robot motion forecasting tasks and obtain state-of-the-art performance by a large margin.
arXiv Detail & Related papers (2024-02-07T11:37:14Z) - HandMeThat: Human-Robot Communication in Physical and Social
Environments [73.91355172754717]
HandMeThat is a benchmark for a holistic evaluation of instruction understanding and following in physical and social environments.
HandMeThat contains 10,000 episodes of human-robot interactions.
We show that both offline and online reinforcement learning algorithms perform poorly on HandMeThat.
arXiv Detail & Related papers (2023-10-05T16:14:46Z) - Developing Social Robots with Empathetic Non-Verbal Cues Using Large
Language Models [2.5489046505746704]
We design and label four types of empathetic non-verbal cues, abbreviated as SAFE: Speech, Action (gesture), Facial expression, and Emotion, in a social robot.
Preliminary results show distinct patterns in the robot's responses, such as a preference for calm and positive social emotions like 'joy' and 'lively', and frequent nodding gestures.
Our work lays the groundwork for future studies on human-robot interactions, emphasizing the essential role of both verbal and non-verbal cues in creating social and empathetic robots.
arXiv Detail & Related papers (2023-08-31T08:20:04Z) - SACSoN: Scalable Autonomous Control for Social Navigation [62.59274275261392]
We develop methods for training policies for socially unobtrusive navigation.
By minimizing this counterfactual perturbation, we can induce robots to behave in ways that do not alter the natural behavior of humans in the shared space.
We collect a large dataset where an indoor mobile robot interacts with human bystanders.
arXiv Detail & Related papers (2023-06-02T19:07:52Z) - Gesture2Path: Imitation Learning for Gesture-aware Navigation [54.570943577423094]
We present Gesture2Path, a novel social navigation approach that combines image-based imitation learning with model-predictive control.
We deploy our method on real robots and showcase the effectiveness of our approach for the four gestures-navigation scenarios.
arXiv Detail & Related papers (2022-09-19T23:05:36Z) - Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of
Demonstrations for Social Navigation [92.66286342108934]
Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a'socially compliant' manner in the presence of other intelligent agents such as humans.
Our dataset contains 8.7 hours, 138 trajectories, 25 miles of socially compliant, human teleoperated driving demonstrations.
arXiv Detail & Related papers (2022-03-28T19:09:11Z) - From Learning to Relearning: A Framework for Diminishing Bias in Social
Robot Navigation [3.3511723893430476]
We argue that social navigation models can replicate, promote, and amplify societal unfairness such as discrimination and segregation.
Our proposed framework consists of two components: textitlearning which incorporates social context into the learning process to account for safety and comfort, and textitrelearning to detect and correct potentially harmful outcomes before the onset.
arXiv Detail & Related papers (2021-01-07T17:42:35Z) - Joint Mind Modeling for Explanation Generation in Complex Human-Robot
Collaborative Tasks [83.37025218216888]
We propose a novel explainable AI (XAI) framework for achieving human-like communication in human-robot collaborations.
The robot builds a hierarchical mind model of the human user and generates explanations of its own mind as a form of communications.
Results show that the generated explanations of our approach significantly improves the collaboration performance and user perception of the robot.
arXiv Detail & Related papers (2020-07-24T23:35:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.