DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences
- URL: http://arxiv.org/abs/2406.03008v2
- Date: Tue, 15 Oct 2024 04:50:40 GMT
- Title: DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences
- Authors: Yidong Huang, Jacob Sansom, Ziqiao Ma, Felix Gervits, Joyce Chai,
- Abstract summary: We introduce DriVLMe, a video-language-model-based agent to facilitate communication between humans and autonomous vehicles.
We demonstrate competitive performance in both open-loop benchmarks and closed-loop human studies.
- Score: 12.51538076211772
- License:
- Abstract: Recent advancements in foundation models (FMs) have unlocked new prospects in autonomous driving, yet the experimental settings of these studies are preliminary, over-simplified, and fail to capture the complexity of real-world driving scenarios in human environments. It remains under-explored whether FM agents can handle long-horizon navigation tasks with free-from dialogue and deal with unexpected situations caused by environmental dynamics or task changes. To explore the capabilities and boundaries of FMs faced with the challenges above, we introduce DriVLMe, a video-language-model-based agent to facilitate natural and effective communication between humans and autonomous vehicles that perceive the environment and navigate. We develop DriVLMe from both embodied experiences in a simulated environment and social experiences from real human dialogue. While DriVLMe demonstrates competitive performance in both open-loop benchmarks and closed-loop human studies, we reveal several limitations and challenges, including unacceptable inference time, imbalanced training data, limited visual understanding, challenges with multi-turn interactions, simplified language generation from robotic experiences, and difficulties in handling on-the-fly unexpected situations like environmental dynamics and task changes.
Related papers
- Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models [81.55156507635286]
Legged robots are physically capable of navigating a diverse variety of environments and overcoming a wide range of obstructions.
Current learning methods often struggle with generalization to the long tail of unexpected situations without heavy human supervision.
We propose a system, VLM-Predictive Control (VLM-PC), combining two key components that we find to be crucial for eliciting on-the-fly, adaptive behavior selection.
arXiv Detail & Related papers (2024-07-02T21:00:30Z) - Probing Multimodal LLMs as World Models for Driving [72.18727651074563]
We look at the application of Multimodal Large Language Models (MLLMs) in autonomous driving.
Despite advances in models like GPT-4o, their performance in complex driving environments remains largely unexplored.
arXiv Detail & Related papers (2024-05-09T17:52:42Z) - RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model [22.25903116720301]
explainability plays a critical role in trustworthy autonomous decision-making.
Recent advancements in Multi-Modal Large Language models (MLLMs) have shown promising potential in enhancing the explainability as a driving agent.
We present RAG-Driver, a novel retrieval-augmented multi-modal large language model that leverages in-context learning for high-performance, explainable, and generalisable autonomous driving.
arXiv Detail & Related papers (2024-02-16T16:57:18Z) - HAZARD Challenge: Embodied Decision Making in Dynamically Changing
Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind.
This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z) - Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic
Systems [67.01132165581667]
We propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components.
We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
arXiv Detail & Related papers (2023-11-13T21:20:17Z) - Model-Based Reinforcement Learning with Isolated Imaginations [61.67183143982074]
We propose Iso-Dream++, a model-based reinforcement learning approach.
We perform policy optimization based on the decoupled latent imaginations.
This enables long-horizon visuomotor control tasks to benefit from isolating mixed dynamics sources in the wild.
arXiv Detail & Related papers (2023-03-27T02:55:56Z) - Chat with the Environment: Interactive Multimodal Perception Using Large
Language Models [19.623070762485494]
Large Language Models (LLMs) have shown remarkable reasoning ability in few-shot robotic planning.
Our study demonstrates that LLMs can provide high-level planning and reasoning skills and control interactive robot behavior in a multimodal environment.
arXiv Detail & Related papers (2023-03-14T23:01:27Z) - DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in
Interactive Autonomous Driving Agents [6.639872461610685]
We introduce Dialogue On the ROad To Handle Irregular Events (DOROTHIE), a novel interactive simulation platform.
Based on this platform, we created the Situated Dialogue Navigation (SDN), a navigation benchmark of 183 trials.
SDN is developed to evaluate the agent's ability to predict dialogue moves from humans as well as generate its own dialogue moves and physical navigation actions.
arXiv Detail & Related papers (2022-10-22T17:52:46Z) - COCOI: Contact-aware Online Context Inference for Generalizable
Non-planar Pushing [87.7257446869134]
General contact-rich manipulation problems are long-standing challenges in robotics.
Deep reinforcement learning has shown great potential in solving robot manipulation tasks.
We propose COCOI, a deep RL method that encodes a context embedding of dynamics properties online.
arXiv Detail & Related papers (2020-11-23T08:20:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.