GSON: A Group-based Social Navigation Framework with Large Multimodal Model
- URL: http://arxiv.org/abs/2409.18084v2
- Date: Tue, 08 Apr 2025 06:45:53 GMT
- Title: GSON: A Group-based Social Navigation Framework with Large Multimodal Model
- Authors: Shangyi Luo, Ji Zhu, Peng Sun, Yuhong Deng, Cunjun Yu, Anxing Xiao, Xueqian Wang,
- Abstract summary: This paper introduces GSON, a novel group-based social navigation framework.<n>GSON uses visual prompting to enable zero-shot extraction of social relationships among pedestrians.<n>We validate GSON through extensive real-world mobile robot navigation experiments.
- Score: 9.94576166903495
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing presence of service robots and autonomous vehicles in human environments, navigation systems need to evolve beyond simple destination reach to incorporate social awareness. This paper introduces GSON, a novel group-based social navigation framework that leverages Large Multimodal Models (LMMs) to enhance robots' social perception capabilities. Our approach uses visual prompting to enable zero-shot extraction of social relationships among pedestrians and integrates these results with robust pedestrian detection and tracking pipelines to overcome the inherent inference speed limitations of LMMs. The planning system incorporates a mid-level planner that sits between global path planning and local motion planning, effectively preserving both global context and reactive responsiveness while avoiding disruption of the predicted social group. We validate GSON through extensive real-world mobile robot navigation experiments involving complex social scenarios such as queuing, conversations, and photo sessions. Comparative results show that our system significantly outperforms existing navigation approaches in minimizing social perturbations while maintaining comparable performance on traditional navigation metrics.
Related papers
- HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments [8.974071308749007]
We study the problem of robot navigation in dense and interactive crowds with environmental constraints such as corridors and furniture.
Previous methods fail to consider all types of interactions among agents and obstacles, leading to unsafe and inefficient robot paths.
We propose a structured framework to learn robot navigation policies with reinforcement learning.
arXiv Detail & Related papers (2024-11-19T00:56:35Z) - A Meta-Engine Framework for Interleaved Task and Motion Planning using Topological Refinements [51.54559117314768]
Task And Motion Planning (TAMP) is the problem of finding a solution to an automated planning problem.
We propose a general and open-source framework for modeling and benchmarking TAMP problems.
We introduce an innovative meta-technique to solve TAMP problems involving moving agents and multiple task-state-dependent obstacles.
arXiv Detail & Related papers (2024-08-11T14:57:57Z) - Multi-Agent Dynamic Relational Reasoning for Social Robot Navigation [50.01551945190676]
Social robot navigation can be helpful in various contexts of daily life but requires safe human-robot interactions and efficient trajectory planning.
We propose a systematic relational reasoning approach with explicit inference of the underlying dynamically evolving relational structures.
We demonstrate its effectiveness for multi-agent trajectory prediction and social robot navigation.
arXiv Detail & Related papers (2024-01-22T18:58:22Z) - A Study on Learning Social Robot Navigation with Multimodal Perception [6.052803245103173]
We present a study on learning social robot navigation with multimodal perception using a large-scale real-world dataset.
We compare unimodal and multimodal learning approaches against a set of classical navigation approaches in different social scenarios.
The results show that multimodal learning has a clear advantage over unimodal learning in both dataset and human studies.
arXiv Detail & Related papers (2023-09-22T01:47:47Z) - Principles and Guidelines for Evaluating Social Robot Navigation
Algorithms [44.51586279645062]
Social robot navigation is difficult to evaluate because it involves dynamic human agents and their perceptions of the appropriateness of robot behavior.
Our contributions include (a) a definition of a socially navigating robot as one that respects the principles of safety, comfort, legibility, politeness, social competency, agent understanding, proactivity, and responsiveness to context, (b) guidelines for the use of metrics, development of scenarios, benchmarks, datasets, and simulators to evaluate social navigation, and (c) a social navigation metrics framework to make it easier to compare results from different simulators, robots and datasets.
arXiv Detail & Related papers (2023-06-29T07:31:43Z) - SG-LSTM: Social Group LSTM for Robot Navigation Through Dense Crowds [17.842560410411387]
We introduce a new Social Group Long Short-term Memory (SG-LSTM) model that models human groups and interactions in dense environments.
Our approach enables navigation algorithms to calculate collision-free paths faster and more accurately in crowded environments.
arXiv Detail & Related papers (2023-03-08T01:38:20Z) - Multi-robot Social-aware Cooperative Planning in Pedestrian Environments
Using Multi-agent Reinforcement Learning [2.7716102039510564]
We propose a novel multi-robot social-aware efficient cooperative planner that on the basis of off-policy multi-agent reinforcement learning (MARL)
We adopt temporal-spatial graph (TSG)-based social encoder to better extract the importance of social relation between each robot and the pedestrians in its field of view (FOV)
arXiv Detail & Related papers (2022-11-29T03:38:47Z) - SoLo T-DIRL: Socially-Aware Dynamic Local Planner based on
Trajectory-Ranked Deep Inverse Reinforcement Learning [4.008601554204486]
This work proposes a new framework for a socially-aware dynamic local planner in crowded environments by building on the recently proposed Trajectory-ranked Entropy Deep Inverse Reinforcement Learning (T-MEDIRL)
To address the social navigation problem, our multi-modal learning planner explicitly considers social interaction factors, as well as social-awareness factors into T-MEDIRL pipeline to learn a reward function from human demonstrations.
Our evaluation shows that this method can successfully make a robot navigate in a crowded social environment and outperforms the state-of-art social navigation methods in terms of the success rate, navigation
arXiv Detail & Related papers (2022-09-16T15:13:33Z) - Conditioned Human Trajectory Prediction using Iterative Attention Blocks [70.36888514074022]
We present a simple yet effective pedestrian trajectory prediction model aimed at pedestrians positions prediction in urban-like environments.
Our model is a neural-based architecture that can run several layers of attention blocks and transformers in an iterative sequential fashion.
We show that without explicit introduction of social masks, dynamical models, social pooling layers, or complicated graph-like structures, it is possible to produce on par results with SoTA models.
arXiv Detail & Related papers (2022-06-29T07:49:48Z) - Socially Compliant Navigation Dataset (SCAND): A Large-Scale Dataset of
Demonstrations for Social Navigation [92.66286342108934]
Social navigation is the capability of an autonomous agent, such as a robot, to navigate in a'socially compliant' manner in the presence of other intelligent agents such as humans.
Our dataset contains 8.7 hours, 138 trajectories, 25 miles of socially compliant, human teleoperated driving demonstrations.
arXiv Detail & Related papers (2022-03-28T19:09:11Z) - SABER: Data-Driven Motion Planner for Autonomously Navigating
Heterogeneous Robots [112.2491765424719]
We present an end-to-end online motion planning framework that uses a data-driven approach to navigate a heterogeneous robot team towards a global goal.
We use model predictive control (SMPC) to calculate control inputs that satisfy robot dynamics, and consider uncertainty during obstacle avoidance with chance constraints.
recurrent neural networks are used to provide a quick estimate of future state uncertainty considered in the SMPC finite-time horizon solution.
A Deep Q-learning agent is employed to serve as a high-level path planner, providing the SMPC with target positions that move the robots towards a desired global goal.
arXiv Detail & Related papers (2021-08-03T02:56:21Z) - Simultaneous Navigation and Construction Benchmarking Environments [73.0706832393065]
We need intelligent robots for mobile construction, the process of navigating in an environment and modifying its structure according to a geometric design.
In this task, a major robot vision and learning challenge is how to exactly achieve the design without GPS.
We benchmark the performance of a handcrafted policy with basic localization and planning, and state-of-the-art deep reinforcement learning methods.
arXiv Detail & Related papers (2021-03-31T00:05:54Z) - PHASE: PHysically-grounded Abstract Social Events for Machine Social
Perception [50.551003004553806]
We create a dataset of physically-grounded abstract social events, PHASE, that resemble a wide range of real-life social interactions.
Phase is validated with human experiments demonstrating that humans perceive rich interactions in the social events.
As a baseline model, we introduce a Bayesian inverse planning approach, SIMPLE, which outperforms state-of-the-art feed-forward neural networks.
arXiv Detail & Related papers (2021-03-02T18:44:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.