Benchmark Real-time Adaptation and Communication Capabilities of Embodied Agent in Collaborative Scenarios
- URL: http://arxiv.org/abs/2412.00435v1
- Date: Sat, 30 Nov 2024 11:17:17 GMT
- Title: Benchmark Real-time Adaptation and Communication Capabilities of Embodied Agent in Collaborative Scenarios
- Authors: Shipeng Liu, Boshen Zhang, Zhehui Huang,
- Abstract summary: Real-time human-AI collaboration requires agents to adapt to unseen human behaviors while maintaining effective communication dynamically.
We propose a Monitor-then-Adapt framework (MonTA) combining strong adaptability and communication with real-time execution.
Our results demonstrate that MonTA outperforms other baseline agents on our proposed benchmark.
- Score: 2.930915232771767
- License:
- Abstract: Advancements in Large Language Models (LLMs) have opened transformative possibilities for human-robot interaction, especially in collaborative environments. However, Real-time human-AI collaboration requires agents to adapt to unseen human behaviors while maintaining effective communication dynamically. Existing benchmarks fall short in evaluating such adaptability for embodied agents, focusing mostly on the task performance of the agent itself. To address this gap, we propose a novel benchmark that assesses agents' reactive adaptability and instantaneous communication capabilities at every step. Based on this benchmark, we propose a Monitor-then-Adapt framework (MonTA), combining strong adaptability and communication with real-time execution. MonTA contains three key LLM modules, a lightweight \textit{Monitor} for monitoring the need for adaptation in high frequency, and two proficient \textit{Adapters} for subtask and path adaptation reasoning in low frequency. Our results demonstrate that MonTA outperforms other baseline agents on our proposed benchmark. Further user studies confirm the high reasonability adaptation plan and consistent language instruction provided by our framework.
Related papers
- Dynamic benchmarking framework for LLM-based conversational data capture [0.0]
This paper introduces a benchmarking framework to assess large language models (LLMs)
It integrates generative agent simulation to evaluate performance on key dimensions: information extraction, context awareness, and adaptive engagement.
Results show that adaptive strategies improve data extraction accuracy, especially when handling ambiguous responses.
arXiv Detail & Related papers (2025-02-04T15:47:47Z) - Efficient Policy Adaptation with Contrastive Prompt Ensemble for Embodied Agents [6.402396836189286]
We present a novel contrastive prompt ensemble (ConPE) framework for embodied reinforcement learning.
We devise a guided-attention-based ensemble approach with multiple visual prompts on the vision-language model to construct robust state representations.
In experiments, we show that ConPE outperforms other state-of-the-art algorithms for several embodied agent tasks.
arXiv Detail & Related papers (2024-12-16T06:53:00Z) - The BrowserGym Ecosystem for Web Agent Research [151.90034093362343]
BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents.
We conduct the first large-scale, multi-benchmark web agent experiment.
Results highlight a large discrepancy between OpenAI and Anthropic's latests models.
arXiv Detail & Related papers (2024-12-06T23:43:59Z) - Collaborative Instance Navigation: Leveraging Agent Self-Dialogue to Minimize User Input [54.81155589931697]
We propose a new task, Collaborative Instance Navigation (CoIN), with dynamic agent-human interaction during navigation.
To address CoIN, we propose a novel method, Agent-user Interaction with UncerTainty Awareness (AIUTA)
AIUTA achieves competitive performance in instance navigation against state-of-the-art methods, demonstrating great flexibility in handling user inputs.
arXiv Detail & Related papers (2024-12-02T08:16:38Z) - Enabling Real-Time Conversations with Minimal Training Costs [61.80370154101649]
This paper presents a new duplex decoding approach that enhances large language models with duplex ability, requiring minimal training.
Experimental results indicate that our proposed method significantly enhances the naturalness and human-likeness of user-AI interactions with minimal training costs.
arXiv Detail & Related papers (2024-09-18T06:27:26Z) - Benchmarking Mobile Device Control Agents across Diverse Configurations [19.01954948183538]
B-MoCA is a benchmark for evaluating and developing mobile device control agents.
We benchmark diverse agents, including agents employing large language models (LLMs) or multi-modal LLMs.
While these agents demonstrate proficiency in executing straightforward tasks, their poor performance on complex tasks highlights significant opportunities for future research to improve effectiveness.
arXiv Detail & Related papers (2024-04-25T14:56:32Z) - CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models [8.123272461141815]
We introduce the TinyAgent model, trained on a meticulously curated high-quality dataset.
We also present the Collaborative Multi-Agent Tuning (CMAT) framework, an innovative system designed to augment language agent capabilities.
In this research, we propose a new communication agent framework that integrates multi-agent systems with environmental feedback mechanisms.
arXiv Detail & Related papers (2024-04-02T06:07:35Z) - Large Language Model-based Human-Agent Collaboration for Complex Task
Solving [94.3914058341565]
We introduce the problem of Large Language Models (LLMs)-based human-agent collaboration for complex task-solving.
We propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC.
This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process.
arXiv Detail & Related papers (2024-02-20T11:03:36Z) - AgentCF: Collaborative Learning with Autonomous Language Agents for
Recommender Systems [112.76941157194544]
We propose AgentCF for simulating user-item interactions in recommender systems through agent-based collaborative filtering.
We creatively consider not only users but also items as agents, and develop a collaborative learning approach that optimize both kinds of agents together.
Overall, the optimized agents exhibit diverse interaction behaviors within our framework, including user-item, user-user, item-item, and collective interactions.
arXiv Detail & Related papers (2023-10-13T16:37:14Z) - Centralized Training with Hybrid Execution in Multi-Agent Reinforcement
Learning [7.163485179361718]
We introduce hybrid execution in multi-agent reinforcement learning (MARL)
MARL is a new paradigm in which agents aim to successfully complete cooperative tasks with arbitrary communication levels at execution time.
We contribute MARO, an approach that makes use of an auto-regressive predictive model, trained in a centralized manner, to estimate missing agents' observations.
arXiv Detail & Related papers (2022-10-12T14:58:32Z) - Lifelong Unsupervised Domain Adaptive Person Re-identification with
Coordinated Anti-forgetting and Adaptation [127.6168183074427]
We propose a new task, Lifelong Unsupervised Domain Adaptive (LUDA) person ReID.
This is challenging because it requires the model to continuously adapt to unlabeled data of the target environments.
We design an effective scheme for this task, dubbed CLUDA-ReID, where the anti-forgetting is harmoniously coordinated with the adaptation.
arXiv Detail & Related papers (2021-12-13T13:19:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.