Related papers: Biotic Browser: Applying StreamingLLM as a Persistent Web Browsing Co-Pilot

Related papers

Build the web for agents, not agents for the web [27.969222950526703]
We introduce the concept of an Agentic Web Interface (AWI), an interface specifically designed for agents to navigate a website.<n>We establish six guiding principles for AWI design, emphasizing safety, efficiency, and standardization.<n>This reframing aims to overcome fundamental limitations of existing interfaces, paving the way for more efficient, reliable, and transparent web agent design.
arXiv Detail & Related papers (2025-06-12T17:53:58Z)
WebUIBench: A Comprehensive Benchmark for Evaluating Multimodal Large Language Models in WebUI-to-Code [57.45181837786448]
Multimodal Large Language Models (MLLMs) have the potential to act as AI software engineers capable of executing complex web application development.<n>Existing benchmarks usually fail to provide an assessment of sub-capabilities and focus solely on webpage generation outcomes.<n>We propose WebUIBench, a benchmark systematically designed to evaluate MLLMs in four key areas: WebUI Perception, HTML Programming,WebUI-HTML Understanding, and WebUI-to-Code.
arXiv Detail & Related papers (2025-06-09T14:46:02Z)
Orca: Browsing at Scale Through User-Driven and AI-Facilitated Orchestration Across Malleable Webpages [18.25019078938821]
We present novel interactions with our prototype web browser, Orca.<n>Orca supports user-driven exploration, operation, organization, and synthesis of web content at scale.<n>Our evaluation revealed an increased "appetite" for information foraging, enhanced user control, and more flexibility in sensemaking across a broader information landscape on the web.
arXiv Detail & Related papers (2025-05-28T20:13:39Z)
A Survey of WebAgents: Towards Next-Generation AI Agents for Web Automation with Large Foundation Models [45.12763718252896]
In the context of the web, leveraging AI Agents -- WebAgents -- to automatically assist people in handling tedious daily tasks can dramatically enhance productivity and efficiency. To fully explore the potential of LFMs, extensive research has emerged on WebAgents designed to complete daily web tasks according to user instructions.
arXiv Detail & Related papers (2025-03-30T08:15:44Z)
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks [52.46737975742287]
We build a self-contained environment with data that mimics a small software company environment. We find that with the most competitive agent, 24% of the tasks can be completed autonomously. This paints a nuanced picture on task automation with LM agents.
arXiv Detail & Related papers (2024-12-18T18:55:40Z)
Steward: Natural Language Web Automation [19.301371856154965]
Large language models (LLMs) have demonstrated exceptional capabilities in serving as the foundation for AI assistants. We introduce Steward, a novel LLM-powered web automation tool designed to serve as a cost-effective, scalable, end-to-end solution for automating web interactions. We discuss various design and implementation challenges, including state representation, action sequence selection, system responsiveness, detecting task completion, and caching implementation.
arXiv Detail & Related papers (2024-09-23T18:06:32Z)
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning [74.58666091522198]
We present a framework for intuitive robot programming by non-experts. We leverage natural language prompts and contextual information from the Robot Operating System (ROS) Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface.
arXiv Detail & Related papers (2024-06-28T08:28:38Z)
AUTONODE: A Neuro-Graphic Self-Learnable Engine for Cognitive GUI Automation [0.0]
Autonomous User-interface Transformation through Online Neuro-graphic Operations and Deep Exploration. Our engine empowers agents to comprehend and implement complex, adapting to dynamic web environments with unparalleled efficiency. The versatility and efficacy of AUTONODE are demonstrated through a series of experiments, highlighting its proficiency in managing a diverse array of web-based tasks.
arXiv Detail & Related papers (2024-03-15T10:27:17Z)
On the Multi-turn Instruction Following for Conversational Web Agents [83.51251174629084]
We introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment. We propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques.
arXiv Detail & Related papers (2024-02-23T02:18:12Z)
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models [65.18602126334716]
Existing web agents typically only handle one input modality and are evaluated only in simplified web simulators or static web snapshots. We introduce WebVoyager, an innovative Large Multimodal Model (LMM) powered web agent that can complete user instructions end-to-end by interacting with real-world websites. We show that WebVoyager achieves a 59.1% task success rate on our benchmark, significantly surpassing the performance of both GPT-4 (All Tools) and the WebVoyager (text-only) setups.
arXiv Detail & Related papers (2024-01-25T03:33:18Z)
AutoML-GPT: Automatic Machine Learning with GPT [74.30699827690596]
We propose developing task-oriented prompts and automatically utilizing large language models (LLMs) to automate the training pipeline. We present the AutoML-GPT, which employs GPT as the bridge to diverse AI models and dynamically trains models with optimized hyper parameters. This approach achieves remarkable results in computer vision, natural language processing, and other challenging areas.
arXiv Detail & Related papers (2023-05-04T02:09:43Z)
Arena-Web -- A Web-based Development and Benchmarking Platform for Autonomous Navigation Approaches [2.4937400423177767]
We present Arena-Web, a web-based development and evaluation suite for developing, training, and testing DRL-based navigation planners. The interface is designed to be intuitive and engaging to appeal to non-experts and make the technology accessible to a wider audience.
arXiv Detail & Related papers (2023-02-06T16:06:07Z)
Adversarial Environment Generation for Learning to Navigate the Web [107.99759923626242]
One of the bottlenecks of training web navigation agents is providing a learnable curriculum of training environments. We propose using Adversarial Environment Generation (AEG) to generate challenging web environments in which to train reinforcement learning (RL) agents. We show that the navigator agent trained with our proposed Flexible b-PAIRED technique significantly outperforms competitive automatic curriculum generation baselines.
arXiv Detail & Related papers (2021-03-02T19:19:30Z)
APPLD: Adaptive Planner Parameter Learning from Demonstration [48.63930323392909]
We introduce APPLD, Adaptive Planner Learning from Demonstration, that allows existing navigation systems to be successfully applied to new complex environments. APPLD is verified on two robots running different navigation systems in different environments. Experimental results show that APPLD can outperform navigation systems with the default and expert-tuned parameters, and even the human demonstrator themselves.
arXiv Detail & Related papers (2020-03-31T21:15:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.