Related papers: ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions

ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions

URL: http://arxiv.org/abs/2505.14668v2
Date: Mon, 27 Oct 2025 07:17:51 GMT
Title: ContextAgent: Context-Aware Proactive LLM Agents with Open-World Sensory Perceptions
Authors: Bufang Yang, Lilin Xu, Liekang Zeng, Kaiwei Liu, Siyang Jiang, Wenrui Lu, Hongkai Chen, Xiaofan Jiang, Guoliang Xing, Zhenyu Yan,
Abstract summary: We introduce ContextAgent, the first context-aware proactive agent.<n> ContextAgent extracts multi-dimensional contexts from sensory perceptions on wearables to understand user intentions.<n>When proactive assistance is needed, ContextAgent automatically calls the necessary tools to assist users unobtrusively.
Score: 8.300084456561171
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in Large Language Models (LLMs) have propelled intelligent agents from reactive responses to proactive support. While promising, existing proactive agents either rely exclusively on observations from enclosed environments (e.g., desktop UIs) with direct LLM inference or employ rule-based proactive notifications, leading to suboptimal user intent understanding and limited functionality for proactive service. In this paper, we introduce ContextAgent, the first context-aware proactive agent that incorporates extensive sensory contexts surrounding humans to enhance the proactivity of LLM agents. ContextAgent first extracts multi-dimensional contexts from massive sensory perceptions on wearables (e.g., video and audio) to understand user intentions. ContextAgent then leverages the sensory contexts and personas from historical data to predict the necessity for proactive services. When proactive assistance is needed, ContextAgent further automatically calls the necessary tools to assist users unobtrusively. To evaluate this new task, we curate ContextAgentBench, the first benchmark for evaluating context-aware proactive LLM agents, covering 1,000 samples across nine daily scenarios and twenty tools. Experiments on ContextAgentBench show that ContextAgent outperforms baselines by achieving up to 8.5% and 6.0% higher accuracy in proactive predictions and tool calling, respectively. We hope our research can inspire the development of more advanced, human-centric, proactive AI assistants. The code and dataset are publicly available at https://github.com/openaiotlab/ContextAgent.

Related papers

AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios [49.90735676070039]
The capacity of AI agents to effectively handle tasks of increasing duration and complexity continues to grow.<n>We argue that current evaluations prioritize increasing task difficulty without sufficiently addressing the diversity of agentic tasks.<n>We propose AgentIF-OneDay, aimed at determining whether general users can utilize natural language instructions and AI agents to complete a diverse array of daily tasks.
arXiv Detail & Related papers (2026-01-28T13:49:18Z)
ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems [7.591337469415894]
ProAgent is an end-to-end proactive agent system that harnesses massive sensory contexts and LLM reasoning to deliver proactive assistance.<n>We implement ProAgent on Augmented Reality (AR) glasses with an edge server and extensively evaluate it on a real-world testbed, a public dataset, and through a user study.<n>Results show that ProAgent achieves up to 33.4% higher proactive prediction accuracy, 16.8% higher tool-calling F1 score, and notable improvements in user satisfaction.
arXiv Detail & Related papers (2025-12-07T08:21:07Z)
InfoAgent: Advancing Autonomous Information-Seeking Agents [143.15973604285304]
We introduce InfoAgent, a deep research agent powered by an innovative data synthesis pipeline and orchestrated web search tools.<n>With our methods, InfoAgent achieves 15.3% accuracy on BrowseComp, 29.2% on BrowseComp-ZH, and 40.4% on Xbench-DS.
arXiv Detail & Related papers (2025-09-29T17:59:57Z)
AgentXploit: End-to-End Redteaming of Black-Box AI Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentXploit, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentXploit on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z)
AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents [75.85554113398626]
We introduce a new benchmark AgentDAM that measures if AI web-navigation agents follow the privacy principle of data minimization''<n>Our benchmark simulates realistic web interaction scenarios end-to-end and is adaptable to all existing web navigation agents.
arXiv Detail & Related papers (2025-03-12T19:30:31Z)
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space. AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z)
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance [95.03771007780976]
We tackle the challenge of developing proactive agents capable of anticipating and initiating tasks without explicit human instructions.<n>First, we collect real-world human activities to generate proactive task predictions.<n>These predictions are labeled by human annotators as either accepted or rejected.<n>The labeled data is used to train a reward model that simulates human judgment.
arXiv Detail & Related papers (2024-10-16T08:24:09Z)
Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems. This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process. We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv Detail & Related papers (2024-10-14T17:57:02Z)
ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions [68.81939215223818]
ProductAgent is a conversational information seeking agent equipped with abilities of strategic clarification question generation and dynamic product retrieval. We develop the agent with strategies for product feature summarization, query generation, and product retrieval. Experiments show that ProductAgent interacts positively with the user and enhances retrieval performance with increasing dialogue turns.
arXiv Detail & Related papers (2024-07-01T03:50:23Z)
CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only [21.054681757006385]
We propose an agent that perceives its environment solely through screenshot images.<n>By leveraging the reasoning capability of the Large Language Models, we eliminate the need for large-scale human demonstration data.<n>Agent achieves an average success rate of 94.5% on MiniWoB++ and an average task score of 62.3 on WebShop.
arXiv Detail & Related papers (2024-06-11T05:21:20Z)
Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects [32.91556128291915]
This paper surveys current research to provide an in-depth overview of intelligent agents within single and multi-agent systems. It covers their definitions, research frameworks, and foundational components such as their composition, cognitive and planning methods, tool utilization, and responses to environmental feedback. We conclude by envisioning prospects for LLM-based agents, considering the evolving landscape of AI and natural language processing.
arXiv Detail & Related papers (2024-01-07T09:08:24Z)
KwaiAgents: Generalized Information-seeking Agent System with Large Language Models [33.59597020276034]
Humans excel in critical thinking, planning, reflection, and harnessing available tools to interact with and interpret the world. Recent advancements in large language models (LLMs) suggest that machines might also possess the aforementioned human-like capabilities. We introduce KwaiAgents, a generalized information-seeking agent system based on LLMs.
arXiv Detail & Related papers (2023-12-08T08:11:11Z)
Improving Knowledge Extraction from LLMs for Task Learning through Agent Analysis [4.055489363682198]
Large language models (LLMs) offer significant promise as a knowledge source for task learning. Prompt engineering has been shown to be effective for eliciting knowledge from an LLM, but alone it is insufficient for acquiring relevant, situationally grounded knowledge for an embodied agent learning novel tasks. We describe a cognitive-agent approach, STARS, that extends and complements prompt engineering, mitigating its limitations and thus enabling an agent to acquire new task knowledge matched to its native language capabilities, embodiment, environment, and user preferences.
arXiv Detail & Related papers (2023-06-11T20:50:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.