Related papers: Orca: Browsing at Scale Through User-Driven and AI-Facilitated Orchestration Across Malleable Webpages

Orca: Browsing at Scale Through User-Driven and AI-Facilitated Orchestration Across Malleable Webpages

URL: http://arxiv.org/abs/2505.22831v1
Date: Wed, 28 May 2025 20:13:39 GMT
Title: Orca: Browsing at Scale Through User-Driven and AI-Facilitated Orchestration Across Malleable Webpages
Authors: Peiling Jiang, Haijun Xia,
Abstract summary: We present novel interactions with our prototype web browser, Orca.<n>Orca supports user-driven exploration, operation, organization, and synthesis of web content at scale.<n>Our evaluation revealed an increased "appetite" for information foraging, enhanced user control, and more flexibility in sensemaking across a broader information landscape on the web.
Score: 18.25019078938821
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Web-based activities are fundamentally distributed across webpages. However, conventional browsers with stacks of tabs fail to support operating and synthesizing large volumes of information across pages. While recent AI systems enable fully automated web browsing and information synthesis, they often diminish user agency and hinder contextual understanding. Therefore, we explore how AI could instead augment users' interactions with content across webpages and mitigate cognitive and manual efforts. Through literature on information tasks and web browsing challenges, and an iterative design process, we present a rich set of novel interactions with our prototype web browser, Orca. Leveraging AI, Orca supports user-driven exploration, operation, organization, and synthesis of web content at scale. To enable browsing at scale, webpages are treated as malleable materials that humans and AI can collaboratively manipulate and compose into a malleable, dynamic, and browser-level workspace. Our evaluation revealed an increased "appetite" for information foraging, enhanced user control, and more flexibility in sensemaking across a broader information landscape on the web.

Related papers

Agentic Web: Weaving the Next Web with AI Agents [109.13815627467514]
The emergence of AI agents powered by large language models (LLMs) marks a pivotal shift toward the Agentic Web.<n>In this paradigm, agents interact directly with one another to plan, coordinate, and execute complex tasks on behalf of users.<n>We present a structured framework for understanding and building the Agentic Web.
arXiv Detail & Related papers (2025-07-28T17:58:12Z)
Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence [109.32705135051486]
Embodied Web Agents is a novel paradigm for AI agents that fluidly bridge the embodiment and web-scale reasoning.<n>We release the Embodied Web Agents Benchmark, which encompasses a diverse suite of tasks.<n>Results reveal significant performance gaps between state-of-the-art AI systems and human capabilities.
arXiv Detail & Related papers (2025-06-18T17:58:17Z)
Build the web for agents, not agents for the web [27.969222950526703]
We introduce the concept of an Agentic Web Interface (AWI), an interface specifically designed for agents to navigate a website.<n>We establish six guiding principles for AWI design, emphasizing safety, efficiency, and standardization.<n>This reframing aims to overcome fundamental limitations of existing interfaces, paving the way for more efficient, reliable, and transparent web agent design.
arXiv Detail & Related papers (2025-06-12T17:53:58Z)
WebThinker: Empowering Large Reasoning Models with Deep Research Capability [60.81964498221952]
WebThinker is a deep research agent that empowers large reasoning models to autonomously search the web, navigate web pages, and draft research reports during the reasoning process.<n>It also employs an textbfAutonomous Think-Search-and-Draft strategy, allowing the model to seamlessly interleave reasoning, information gathering, and report writing in real time.<n>Our approach enhances LRM reliability and applicability in complex scenarios, paving the way for more capable and versatile deep research systems.
arXiv Detail & Related papers (2025-04-30T16:25:25Z)
WebNav: An Intelligent Agent for Voice-Controlled Web Navigation [0.0]
WebNav is a novel agent for multi-modal web navigation.<n>System combines vision-based context from screenshots with a dynamic DOM-labeling browser extension.
arXiv Detail & Related papers (2025-03-18T02:33:27Z)
WebGames: Challenging General-Purpose Web-Browsing AI Agents [11.320069795732058]
WebGames is a comprehensive benchmark suite designed to evaluate general-purpose web-browsing AI agents.<n>We evaluate leading vision-language models including GPT-4o, Claude Computer-Use, Gemini-1.5-Pro, and Qwen2-VL against human performance.<n>Results reveal a substantial capability gap, with the best AI system achieving only 43.1% success rate compared to human performance of 95.7%.
arXiv Detail & Related papers (2025-02-25T16:45:08Z)
Biotic Browser: Applying StreamingLLM as a Persistent Web Browsing Co-Pilot [0.0]
"Biotic Browser" is an innovative AI assistant leveraging StreamingLLM to transform web navigation and task execution. Characterized by its ability to simulate the experience of a passenger in an autonomous vehicle, the Biotic Browser excels in managing extended interactions and complex, multi-step web-based tasks.
arXiv Detail & Related papers (2024-10-31T16:12:02Z)
Survey of User Interface Design and Interaction Techniques in Generative AI Applications [79.55963742878684]
We aim to create a compendium of different user-interaction patterns that can be used as a reference for designers and developers alike. We also strive to lower the entry barrier for those attempting to learn more about the design of generative AI applications.
arXiv Detail & Related papers (2024-10-28T23:10:06Z)
AUTONODE: A Neuro-Graphic Self-Learnable Engine for Cognitive GUI Automation [0.0]
Autonomous User-interface Transformation through Online Neuro-graphic Operations and Deep Exploration. Our engine empowers agents to comprehend and implement complex, adapting to dynamic web environments with unparalleled efficiency. The versatility and efficacy of AUTONODE are demonstrated through a series of experiments, highlighting its proficiency in managing a diverse array of web-based tasks.
arXiv Detail & Related papers (2024-03-15T10:27:17Z)
VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks [93.85005277463802]
VisualWebArena is a benchmark designed to assess the performance of multimodal web agents on realistic tasks. To perform on this benchmark, agents need to accurately process image-text inputs, interpret natural language instructions, and execute actions on websites to accomplish user-defined objectives.
arXiv Detail & Related papers (2024-01-24T18:35:21Z)
Exploring the Potential of Generative AI for the World Wide Web [0.94491536689161]
We explore the potential of generative AI within the realm of the World Wide Web. Web developers already harness generative AI to help crafting text and images. Web browsers might use it in the future to locally generate images for tasks like repairing broken webpages, conserving bandwidth, and enhancing privacy.
arXiv Detail & Related papers (2023-10-26T13:02:45Z)
Adversarial Environment Generation for Learning to Navigate the Web [107.99759923626242]
One of the bottlenecks of training web navigation agents is providing a learnable curriculum of training environments. We propose using Adversarial Environment Generation (AEG) to generate challenging web environments in which to train reinforcement learning (RL) agents. We show that the navigator agent trained with our proposed Flexible b-PAIRED technique significantly outperforms competitive automatic curriculum generation baselines.
arXiv Detail & Related papers (2021-03-02T19:19:30Z)
Bringing Cognitive Augmentation to Web Browsing Accessibility [69.62988485669146]
We explore opportunities brought by cognitive augmentation to provide a more natural and accessible web browsing experience. We develop a conceptual framework for supporting BVIP conversational web browsing needs. We describe our early work and prototype that leverages that consider structural and content features only.
arXiv Detail & Related papers (2020-12-07T14:40:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.