Uncertainty-Aware GUI Agent: Adaptive Perception through Component Recommendation and Human-in-the-Loop Refinement
- URL: http://arxiv.org/abs/2508.04025v1
- Date: Wed, 06 Aug 2025 02:38:02 GMT
- Title: Uncertainty-Aware GUI Agent: Adaptive Perception through Component Recommendation and Human-in-the-Loop Refinement
- Authors: Chao Hao, Shuai Wang, Kaiwen Zhou,
- Abstract summary: We present textbfRecAgent, an uncertainty-aware agent that addresses these issues through adaptive perception.<n>To reduce perceptual uncertainty, RecAgent employs a component recommendation mechanism that identifies and focuses on the most relevant UI elements.<n>For decision uncertainty, it uses an interactive module to request user feedback in ambiguous situations, enabling intent-aware decisions.
- Score: 11.63498742723335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Graphical user interface (GUI) agents have shown promise in automating mobile tasks but still struggle with input redundancy and decision ambiguity. In this paper, we present \textbf{RecAgent}, an uncertainty-aware agent that addresses these issues through adaptive perception. We distinguish two types of uncertainty in GUI navigation: (1) perceptual uncertainty, caused by input redundancy and noise from comprehensive screen information, and (2) decision uncertainty, arising from ambiguous tasks and complex reasoning. To reduce perceptual uncertainty, RecAgent employs a component recommendation mechanism that identifies and focuses on the most relevant UI elements. For decision uncertainty, it uses an interactive module to request user feedback in ambiguous situations, enabling intent-aware decisions. These components are integrated into a unified framework that proactively reduces input complexity and reacts to high-uncertainty cases via human-in-the-loop refinement. Additionally, we propose a dataset called \textbf{ComplexAction} to evaluate the success rate of GUI agents in executing specified single-step actions within complex scenarios. Extensive experiments validate the effectiveness of our approach. The dataset and code will be available at https://github.com/Fanye12/RecAgent.
Related papers
- GTA1: GUI Test-time Scaling Agent [77.60727242084971]
This paper investigates the two main challenges with our GUI Test-time Scaling Agent, GTA1.<n>First, to select the most appropriate action proposal, we introduce a test-time scaling method.<n>Second, we propose a model that achieves improved accuracy when grounding the selected action proposal to its corresponding visual elements.
arXiv Detail & Related papers (2025-07-08T08:52:18Z) - Less is More: Empowering GUI Agent with Context-Aware Simplification [62.02157661751793]
We propose a context-aware framework for building an efficient and effective GUI Agent, termed SimpAgent.<n>With the above components, SimpAgent reduces 27% FLOPs and achieves superior GUI navigation performances.
arXiv Detail & Related papers (2025-07-04T17:37:15Z) - Reward-Driven Interaction: Enhancing Proactive Dialogue Agents through User Satisfaction Prediction [22.105598216923706]
We propose two auxiliary tasks to improve the representation learning of user utterances and sessions that enhance user satisfaction prediction.<n>The proposed method is evaluated on DuerOS, demonstrating significant improvements in the accuracy of error recognition on rare user utterances and long-tailed domains.
arXiv Detail & Related papers (2025-05-24T15:01:30Z) - CLEAR-KGQA: Clarification-Enhanced Ambiguity Resolution for Knowledge Graph Question Answering [13.624962763072899]
KGQA systems typically assume user queries are unambiguous, which is an assumption that rarely holds in real-world applications.<n>We propose a novel framework that dynamically handles both entity ambiguity (e.g., distinguishing between entities with similar names) and intent ambiguity (e.g., clarifying different interpretations of user queries) through interactive clarification.
arXiv Detail & Related papers (2025-04-13T17:34:35Z) - Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions.<n>Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes.<n>We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z) - Enhancing Trust in Autonomous Agents: An Architecture for Accountability and Explainability through Blockchain and Large Language Models [0.3495246564946556]
This work presents an accountability and explainability architecture implemented for ROS-based mobile robots.<n>The proposed solution consists of two main components. Firstly, a black box-like element to provide accountability, featuring anti-tampering properties achieved through blockchain technology.<n> Secondly, a component in charge of generating natural language explanations by harnessing the capabilities of Large Language Models (LLMs) over the data contained within the previously mentioned black box.
arXiv Detail & Related papers (2024-03-14T16:57:18Z) - Tell Me More! Towards Implicit User Intention Understanding of Language
Model Driven Agents [110.25679611755962]
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions.
We introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries.
We empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals.
arXiv Detail & Related papers (2024-02-14T14:36:30Z) - Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior.
In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z) - Diagnosis, Feedback, Adaptation: A Human-in-the-Loop Framework for
Test-Time Policy Adaptation [20.266695694005943]
Policies often fail due to distribution shift -- changes in the state and reward that occur when a policy is deployed in new environments.
Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation.
We propose an interactive framework to leverage feedback directly from the user to identify personalized task-irrelevant concepts.
arXiv Detail & Related papers (2023-07-12T17:55:08Z) - Task-Oriented Over-the-Air Computation for Multi-Device Edge AI [57.50247872182593]
6G networks for supporting edge AI features task-oriented techniques that focus on effective and efficient execution of AI task.
Task-oriented over-the-air computation (AirComp) scheme is proposed in this paper for multi-device split-inference system.
arXiv Detail & Related papers (2022-11-02T16:35:14Z) - Dirichlet uncertainty wrappers for actionable algorithm accuracy
accountability and auditability [0.5156484100374058]
We propose a wrapper that enriches its output prediction with a measure of uncertainty.
Based on the resulting uncertainty measure, we advocate for a rejection system that selects the more confident predictions.
Results demonstrate the effectiveness of the uncertainty computed by the wrapper.
arXiv Detail & Related papers (2019-12-29T11:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.