Interaction-Driven Browsing: A Human-in-the-Loop Conceptual Framework Informed by Human Web Browsing for Browser-Using Agents
- URL: http://arxiv.org/abs/2509.12049v1
- Date: Mon, 15 Sep 2025 15:31:53 GMT
- Title: Interaction-Driven Browsing: A Human-in-the-Loop Conceptual Framework Informed by Human Web Browsing for Browser-Using Agents
- Authors: Hyeonggeun Yun, Jinkyu Jang,
- Abstract summary: We present a human-in-the-loop conceptual framework informed by theories of human web browsing behavior.<n>The framework centers on an iterative loop in which the BUA proactively proposes next actions and the user steers the browsing process through feedback.
- Score: 0.6445605125467574
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Although browser-using agents (BUAs) show promise for web tasks and automation, most BUAs terminate after executing a single instruction, failing to support users' complex, nonlinear browsing with ambiguous goals, iterative decision-making, and changing contexts. We present a human-in-the-loop (HITL) conceptual framework informed by theories of human web browsing behavior. The framework centers on an iterative loop in which the BUA proactively proposes next actions and the user steers the browsing process through feedback. It also distinguishes between exploration and exploitation actions, enabling users to control the breadth and depth of their browsing. Consequently, the framework aims to reduce users' physical and cognitive effort while preserving users' traditional browsing mental model and supporting users in achieving satisfactory outcomes. We illustrate how the framework operates with hypothetical use cases and discuss the shift from manual browsing to interaction-driven browsing. We contribute a theoretically informed conceptual framework for BUAs.
Related papers
- Nested Browser-Use Learning for Agentic Information Seeking [60.775556172513014]
Information-seeking (IS) agents have achieved strong performance across a range of wide and deep search tasks, yet their tool use remains largely restricted to API-level snippet retrieval and URL-based page fetching.<n>We propose Nested Browser-Use Learning (NestBrowse), which introduces a minimal and complete browser-action framework that decouples interaction control from page exploration through a nested structure.
arXiv Detail & Related papers (2025-12-29T17:59:14Z) - Building the Web for Agents: A Declarative Framework for Agent-Web Interaction [0.7116403133334644]
We introduce VOIX, a web-native framework that enables websites to expose reliable, auditable, and privacy-preserving capabilities for AI agents.<n> VOIX introduces tool> and context> tags, allowing developers to explicitly define available actions and relevant state.<n>We evaluated the framework's practicality, learnability, and expressiveness in a three-day hackathon study with 16 developers.
arXiv Detail & Related papers (2025-11-14T13:23:34Z) - Evaluating Node-tree Interfaces for AI Explainability [0.5437050212139087]
This study evaluates user experiences with two distinct AI interfaces - node-tree interfaces and chatbots.<n>Our node-tree interface visually structures AI-generated responses into hierarchically organized, interactive nodes.<n>Our findings suggest that AI interfaces capable of switching between structured visualizations and conversational formats can significantly enhance transparency and user confidence in AI-powered systems.
arXiv Detail & Related papers (2025-10-07T20:48:08Z) - Agency Among Agents: Designing with Hypertextual Friction in the Algorithmic Web [0.29465623430708904]
We show that hypertext systems emphasize provenance, associative thinking, and user-driven meaning-making.<n>We show that algorithmic systems tend to obscure process and flatten participation.
arXiv Detail & Related papers (2025-07-31T14:18:28Z) - Orca: Browsing at Scale Through User-Driven and AI-Facilitated Orchestration Across Malleable Webpages [18.25019078938821]
We present novel interactions with our prototype web browser, Orca.<n>Orca supports user-driven exploration, operation, organization, and synthesis of web content at scale.<n>Our evaluation revealed an increased "appetite" for information foraging, enhanced user control, and more flexibility in sensemaking across a broader information landscape on the web.
arXiv Detail & Related papers (2025-05-28T20:13:39Z) - WebCoT: Enhancing Web Agent Reasoning by Reconstructing Chain-of-Thought in Reflection, Branching, and Rollback [74.82886755416949]
We identify key reasoning skills essential for effective web agents.<n>We reconstruct the agent's reasoning algorithms into chain-of-thought rationales.<n>Our approach yields significant improvements across multiple benchmarks.
arXiv Detail & Related papers (2025-05-26T14:03:37Z) - R2D2: Remembering, Replaying and Dynamic Decision Making with a Reflective Agentic Memory [53.94879482534949]
Current models often struggle with efficient navigation and action execution due to limited visibility and understanding of web structures.<n>Our proposed R2D2 framework addresses these challenges by integrating two paradigms: Remember and Reflect.<n>Our findings suggest that a combination of memory-enhanced navigation and reflective learning promisingly advances the capabilities of web agents.
arXiv Detail & Related papers (2025-01-21T20:21:58Z) - Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues [54.81155589931697]
Collaborative Instance object Navigation (CoIN) is a new task setting where the agent actively resolve uncertainties about the target instance.<n>We propose a novel training-free method, Agent-user Interaction with UncerTainty Awareness (AIUTA)<n>First, upon object detection, a Self-Questioner model initiates a self-dialogue within the agent to obtain a complete and accurate observation description.<n>An Interaction Trigger module determines whether to ask a question to the human, continue or halt navigation.
arXiv Detail & Related papers (2024-12-02T08:16:38Z) - Representing Web Applications As Knowledge Graphs [0.0]
The proposed method models each node as a structured representation of the application's current state, with edges reflecting user-initiated actions or transitions.
This structured representation enables a more comprehensive and functional understanding of web applications, offering valuable insights for downstream tasks such as automated testing and behavior analysis.
arXiv Detail & Related papers (2024-10-06T02:50:41Z) - Constraining Participation: Affordances of Feedback Features in Interfaces to Large Language Models [49.74265453289855]
Large language models (LLMs) are now accessible to anyone with a computer, a web browser, and an internet connection via browser-based interfaces.
This paper examines the affordances of interactive feedback features in ChatGPT's interface, analysing how they shape user input and participation in iteration.
arXiv Detail & Related papers (2024-08-27T13:50:37Z) - Tell Me More! Towards Implicit User Intention Understanding of Language
Model Driven Agents [110.25679611755962]
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions.
We introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries.
We empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals.
arXiv Detail & Related papers (2024-02-14T14:36:30Z) - Think, Act, and Ask: Open-World Interactive Personalized Robot Navigation [17.279875204729553]
Zero-Shot Object Navigation (ZSON) enables agents to navigate towards open-vocabulary objects in unknown environments.
We introduce ZIPON, where robots need to navigate to personalized goal objects while engaging in conversations with users.
We propose Open-woRld Interactive persOnalized Navigation (ORION) to make sequential decisions to manipulate different modules for perception, navigation and communication.
arXiv Detail & Related papers (2023-10-12T01:17:56Z) - Signifiers as a First-class Abstraction in Hypermedia Multi-Agent
Systems [0.6595290783361959]
We build on concepts and methods from Affordance Theory and Human-Computer Interaction to introduce signifiers as a first-class abstraction in Web-based Multi-Agent Systems.
We define a formal model for the contextual exposure of signifiers in hypermedia environments that aims to drive affordance exploitation.
arXiv Detail & Related papers (2023-02-14T10:54:46Z) - Bringing Cognitive Augmentation to Web Browsing Accessibility [69.62988485669146]
We explore opportunities brought by cognitive augmentation to provide a more natural and accessible web browsing experience.
We develop a conceptual framework for supporting BVIP conversational web browsing needs.
We describe our early work and prototype that leverages that consider structural and content features only.
arXiv Detail & Related papers (2020-12-07T14:40:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.