Related papers: Overseeing Agents Without Constant Oversight: Challenges and Opportunities

Overseeing Agents Without Constant Oversight: Challenges and Opportunities

URL: http://arxiv.org/abs/2602.16844v1
Date: Wed, 18 Feb 2026 20:16:24 GMT
Title: Overseeing Agents Without Constant Oversight: Challenges and Opportunities
Authors: Madeleine Grunde-McLaughlin, Hussein Mozannar, Maya Murad, Jingya Chen, Saleema Amershi, Adam Fourney,
Abstract summary: We investigate the utility of basic action traces for verification, explore three alternatives via design probes, and test a novel interface's impact on error finding.<n>Our study surfaces challenges for human verification of agentic systems, including managing built-in assumptions, users' subjective and changing correctness criteria, and the shortcomings, yet importance, of communicating the agent's process.
Score: 18.59016735781908
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To enable human oversight, agentic AI systems often provide a trace of reasoning and action steps. Designing traces to have an informative, but not overwhelming, level of detail remains a critical challenge. In three user studies on a Computer User Agent, we investigate the utility of basic action traces for verification, explore three alternatives via design probes, and test a novel interface's impact on error finding in question-answering tasks. As expected, we find that current practices are cumbersome, limiting their efficacy. Conversely, our proposed design reduced the time participants spent finding errors. However, although participants reported higher levels of confidence in their decisions, their final accuracy was not meaningfully improved. To this end, our study surfaces challenges for human verification of agentic systems, including managing built-in assumptions, users' subjective and changing correctness criteria, and the shortcomings, yet importance, of communicating the agent's process.

Related papers

DeepSearchQA: Bridging the Comprehensiveness Gap for Deep Research Agents [10.197402632091551]
DeepSearchQA is a 900-prompt benchmark for evaluating agents on difficult multi-step information-seeking tasks.<n>This dataset is designed to evaluate an agent's ability to execute complex search plans to generate exhaustive answer lists.
arXiv Detail & Related papers (2026-01-28T19:20:47Z)
Agentic Metacognition: Designing a "Self-Aware" Low-Code Agent for Failure Prediction and Human Handoff [0.0]
Non-deterministic nature of autonomous agents presents reliability challenges.<n> secondary, "metacognitive" layer actively monitors primary LCNC agent.<n>Inspired by human introspection, this layer is designed to predict impending task failures.
arXiv Detail & Related papers (2025-09-24T06:10:23Z)
Dark Patterns Meet GUI Agents: LLM Agent Susceptibility to Manipulative Interfaces and the Role of Human Oversight [51.53020962098759]
This study examines how agents, human participants, and human-AI teams respond to 16 types of dark patterns across diverse scenarios.<n>Phase 1 highlights that agents often fail to recognize dark patterns, and even when aware, prioritize task completion over protective action.<n>Phase 2 revealed divergent failure modes: humans succumb due to cognitive shortcuts and habitual compliance, while agents falter from procedural blind spots.
arXiv Detail & Related papers (2025-09-12T22:26:31Z)
Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions.<n>Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes.<n>We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z)
Challenges in Human-Agent Communication [55.53932430345333]
We identify and analyze twelve key communication challenges that these systems pose.<n>These include challenges in conveying information from the agent to the user, challenges in enabling the user to convey information to the agent, and overarching challenges that need to be considered across all human-agent communication.<n>Our findings serve as an urgent call for new design patterns, principles, and guidelines to support transparency and control in these systems.
arXiv Detail & Related papers (2024-11-28T01:21:26Z)
Understanding How Blind Users Handle Object Recognition Errors: Strategies and Challenges [10.565823004989817]
This paper presents a study aimed at understanding blind users' interaction with object recognition systems for identifying and avoiding errors. We conducted a user study involving 12 blind and low-vision participants. We gained insights into users' experiences, challenges, and strategies for identifying errors in camera-based assistive technologies and object recognition systems.
arXiv Detail & Related papers (2024-08-06T17:09:56Z)
Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents [110.25679611755962]
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions. We introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries. We empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals.
arXiv Detail & Related papers (2024-02-14T14:36:30Z)
Learning to Break: Knowledge-Enhanced Reasoning in Multi-Agent Debate System [16.830182915504555]
Multi-agent debate system (MAD) imitates the process of human discussion in pursuit of truth. It is challenging to make various agents perform right and highly consistent cognition due to their limited and different knowledge backgrounds. We propose a novel underlineMulti-underlineAgent underlineDebate with underlineKnowledge-underlineEnhanced framework to promote the system to find the solution.
arXiv Detail & Related papers (2023-12-08T06:22:12Z)
Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior. In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z)
Unsupervised Person Re-Identification: A Systematic Survey of Challenges and Solutions [64.68497473454816]
Unsupervised person Re-ID has drawn increasing attention for its potential to address the scalability issue in person Re-ID. Unsupervised person Re-ID is challenging primarily due to lacking identity labels to supervise person feature learning. This survey review recent works on unsupervised person Re-ID from the perspective of challenges and solutions.
arXiv Detail & Related papers (2021-09-01T00:01:35Z)
Improving Playtesting Coverage via Curiosity Driven Reinforcement Learning Agents [0.4129225533930966]
This paper addresses the problem of automatically exploring and testing a given scenario using reinforcement learning agents trained to maximize game state coverage. The curious agents are able to learn the complex navigation mechanics required to reach the different areas around the map, thus providing the necessary data to identify potential issues.
arXiv Detail & Related papers (2021-03-25T12:51:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.