Related papers: Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions

Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions

URL: http://arxiv.org/abs/2408.02544v1
Date: Mon, 5 Aug 2024 15:16:22 GMT
Title: Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions
Authors: Xinbei Ma, Yiting Wang, Yao Yao, Tongxin Yuan, Aston Zhang, Zhuosheng Zhang, Hai Zhao,
Abstract summary: This paper investigates the faithfulness of multimodal large language model (MLLM) agents in the graphical user interface (GUI) environment. A general setting is proposed where both the user and the agent are benign, and the environment, while not malicious, contains unrelated content. Experimental results reveal that even the most powerful models, whether generalist agents or specialist GUI agents, are susceptible to distractions.
Score: 68.92637077909693
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper investigates the faithfulness of multimodal large language model (MLLM) agents in the graphical user interface (GUI) environment, aiming to address the research question of whether multimodal GUI agents can be distracted by environmental context. A general setting is proposed where both the user and the agent are benign, and the environment, while not malicious, contains unrelated content. A wide range of MLLMs are evaluated as GUI agents using our simulated dataset, following three working patterns with different levels of perception. Experimental results reveal that even the most powerful models, whether generalist agents or specialist GUI agents, are susceptible to distractions. While recent studies predominantly focus on the helpfulness (i.e., action accuracy) of multimodal agents, our findings indicate that these agents are prone to environmental distractions, resulting in unfaithful behaviors. Furthermore, we switch to the adversarial perspective and implement environment injection, demonstrating that such unfaithfulness can be exploited, leading to unexpected risks.

Related papers

Evolutionary Perspectives on the Evaluation of LLM-Based AI Agents: A Comprehensive Survey [45.485318955120924]
The transition from traditional large language models (LLMs) to more advanced AI agents represents a pivotal evolutionary step.<n>Existing evaluation frameworks often blur the distinctions between LLM chatbots and AI agents, leading to confusion among researchers selecting appropriate benchmarks.<n>This paper introduces a systematic analysis of current evaluation approaches, grounded in an evolutionary perspective.
arXiv Detail & Related papers (2025-06-06T17:52:18Z)
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks [94.19506319646376]
We introduce Agent-X, a benchmark for evaluating vision-centric agents in real-world, multimodal settings.<n>Agent-X features 828 agentic tasks with authentic visual contexts, including images, multi-image comparisons, videos, and instructional text.<n>Our results reveal that even the best-performing models, including GPT, Gemini, and Qwen families, struggle to solve multi-step vision tasks.
arXiv Detail & Related papers (2025-05-30T17:59:53Z)
$C^3$-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking [12.218102495632937]
We present an open-source benchmark $C3$-Bench to assess agent robustness.<n>In concrete, we design three challenges: navigate complex tool relationships, handle critical hidden information and manage dynamic decision paths.<n>In essence, $C3$-Bench aims to expose model vulnerabilities through these challenges and drive research into the interpretability of agent performance.
arXiv Detail & Related papers (2025-05-24T15:25:44Z)
MAFE: Multi-Agent Fair Environments for Decision-Making Systems [30.91792275900066]
We introduce the concept of a Multi-Agent Fair Environment (MAFE) and present and analyze three MAFEs that model distinct social systems. Experimental results demonstrate the utility of our MAFEs as testbeds for developing multi-agent fair algorithms.
arXiv Detail & Related papers (2025-02-25T04:03:50Z)
AgentAlign: Misalignment-Adapted Multi-Agent Perception for Resilient Inter-Agent Sensor Correlations [8.916036880001734]
Existing research overlooks the fragile multi-sensor correlations in multi-agent settings. AgentAlign is a real-world heterogeneous agent cross-modality feature alignment framework. We present a novel V2XSet-noise dataset that simulates realistic sensor imperfections under diverse environmental conditions.
arXiv Detail & Related papers (2024-12-09T01:51:18Z)
MageBench: Bridging Large Multimodal Models to Agents [90.59091431806793]
LMMs have shown impressive visual understanding capabilities, with the potential to be applied in agents. Existing benchmarks mostly assess their reasoning abilities in language part. MageBench is a reasoning capability oriented multimodal agent benchmark.
arXiv Detail & Related papers (2024-12-05T17:08:19Z)
R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models [50.19174067263255]
We introduce prior preference learning techniques and self-revision schedules to help the agent excel in sparse-reward, continuous action, goal-based robotic control POMDP environments. We show that our agents offer improved performance over state-of-the-art models in terms of cumulative rewards, relative stability, and success rate.
arXiv Detail & Related papers (2024-09-21T18:32:44Z)
HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind. This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z)
Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data. We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z)
INTAGS: Interactive Agent-Guided Simulation [4.04638613278729]
In many applications involving multi-agent system (MAS), it is imperative to test an experimental (Exp) autonomous agent in a high-fidelity simulator prior to its deployment to production. We propose a metric to distinguish between real and synthetic multi-agent systems, which is evaluated through the live interaction between the Exp and BG agents. We show that using INTAGS to calibrate the simulator can generate more realistic market data compared to the state-of-the-art conditional Wasserstein Generative Adversarial Network approach.
arXiv Detail & Related papers (2023-09-04T19:56:18Z)
AgentBench: Evaluating LLMs as Agents [88.45506148281379]
Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks. We present AgentBench, a benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities.
arXiv Detail & Related papers (2023-08-07T16:08:11Z)
Semantic Tracklets: An Object-Centric Representation for Visual Multi-Agent Reinforcement Learning [126.57680291438128]
We study whether scalability can be achieved via a disentangled representation. We evaluate semantic tracklets' on the visual multi-agent particle environment (VMPE) and on the challenging visual multi-agent GFootball environment. Notably, this method is the first to successfully learn a strategy for five players in the GFootball environment using only visual data.
arXiv Detail & Related papers (2021-08-06T22:19:09Z)
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design [121.73425076217471]
We propose Unsupervised Environment Design (UED), where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments. We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED) Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
arXiv Detail & Related papers (2020-12-03T17:37:01Z)
Heterogeneous Multi-Agent Reinforcement Learning for Unknown Environment Mapping [0.0]
We present an actor-critic algorithm that allows a team of heterogeneous agents to learn decentralized control policies for covering an unknown environment. This task is of interest to national security and emergency response organizations that would like to enhance situational awareness in hazardous areas by deploying teams of unmanned aerial vehicles.
arXiv Detail & Related papers (2020-10-06T12:23:05Z)
Relational-Grid-World: A Novel Relational Reasoning Environment and An Agent Model for Relational Information Extraction [0.0]
Reinforcement learning (RL) agents are often designed specifically for a particular problem and they generally have uninterpretable working processes. Statistical methods-based RL algorithms can be improved in terms of generalizability and interpretability using symbolic Artificial Intelligence (AI) tools such as logic programming. We present a model-free RL architecture that is supported with explicit relational representations of the environmental objects.
arXiv Detail & Related papers (2020-07-12T11:30:48Z)
Diagnosing the Environment Bias in Vision-and-Language Navigation [102.02103792590076]
Vision-and-Language Navigation (VLN) requires an agent to follow natural-language instructions, explore the given environments, and reach the desired target locations. Recent works that study VLN observe a significant performance drop when tested on unseen environments, indicating that the neural agent models are highly biased towards training environments. In this work, we design novel diagnosis experiments via environment re-splitting and feature replacement, looking into possible reasons for this environment bias.
arXiv Detail & Related papers (2020-05-06T19:24:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.