InSTA: Towards Internet-Scale Training For Agents
- URL: http://arxiv.org/abs/2502.06776v2
- Date: Thu, 22 May 2025 17:59:11 GMT
- Title: InSTA: Towards Internet-Scale Training For Agents
- Authors: Brandon Trabucco, Gunnar Sigurdsson, Robinson Piramuthu, Ruslan Salakhutdinov,
- Abstract summary: We develop a pipeline to facilitate internet-scale training for agents without laborious human annotations.<n>We train agents based on Qwen 3 1.7B that are competitive with frontier LLMs as web agents, while being smaller and faster.<n>Our top agent reaches a success rate of 56.9%, outperforming the data collection policy Qwen 3B, a 235 times larger Llama 4 Maverick, and reaching 94.7% of the performance of Gemini 2.5 Flash.
- Score: 49.763517682308766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The predominant approach for training web navigation agents is to gather human demonstrations for a set of popular websites and hand-written tasks, but it is becoming clear that human data is an inefficient resource. We develop a pipeline to facilitate internet-scale training for agents without laborious human annotations. In the first stage, an LLM annotates 150k sites with agentic tasks. In the next stage, LLM agents complete tasks and produce trajectories. In the final stage, an LLM filters trajectories by judging their success. Language models are powerful data curation tools, identifying harmful content with an accuracy of 97%, judging successful trajectories with an accuracy of 82.6%, and producing effective data. We train agents based on Qwen 3 1.7B that are competitive with frontier LLMs as web agents, while being smaller and faster. Our top agent reaches a success rate of 56.9%, outperforming the data collection policy Qwen 3 235B, a 235 times larger Llama 4 Maverick, and reaching 94.7% of the performance of Gemini 2.5 Flash. We are releasing code, models and data at: https://data-for-agents.github.io.
Related papers
- WebDancer: Towards Autonomous Information Seeking Agency [69.33360019344083]
Recent progress in agentic systems underscores the potential for autonomous multi-step research.<n>We present a cohesive paradigm for building end-to-end agentic information seeking agents from a data-centric and training-stage perspective.<n>We instantiate this framework in a web agent based on the ReAct, WebDancer.
arXiv Detail & Related papers (2025-05-28T17:57:07Z) - APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay [86.01901238059261]
APIGen-MT is a framework that generates verifiable and diverse multi-turn agent data.<n>We train a family of models -- the xLAM-2-fc-r series with sizes ranging from 1B to 70B parameters.<n>Our models outperform frontier models such as GPT-4o and Claude 3.5 on $tau$-bench and BFCL benchmarks.
arXiv Detail & Related papers (2025-04-04T17:13:57Z) - Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments [33.83610929282721]
Learn-by-interact is a data-centric framework to adapt large language models (LLMs) to any given environments without human annotations.<n>We assess the quality of our synthetic data by using them in both training-based scenarios and training-free in-context learning (ICL)<n>Experiments on SWE-bench, WebArena, OSWorld and Spider2-V spanning across realistic coding, web, and desktop environments show the effectiveness of Learn-by-interact.
arXiv Detail & Related papers (2025-01-18T22:34:41Z) - ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data [18.129300915372415]
Large Language Model (LLM) agents are rapidly improving to handle increasingly complex web-based tasks.<n>General-purpose LLMs are not specifically trained to understand specialized web contexts such as HTML.<n>We explore an alternative approach that fine-tunes open-source LLMs using production-scale workflow data collected from over 250 domains corresponding to 6 billion tokens.
arXiv Detail & Related papers (2024-11-22T15:26:23Z) - AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space.<n>AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z) - Agent Workflow Memory [71.81385627556398]
We introduce Agent Memory, a method for inducing commonly reused routines.
AWM substantially improves the baseline results by 24.6% and 51.1% relative success rate.
Online AWM robustly generalizes in cross-task, website, and domain evaluations.
arXiv Detail & Related papers (2024-09-11T17:21:00Z) - An Extremely Data-efficient and Generative LLM-based Reinforcement Learning Agent for Recommenders [1.0154385852423122]
reinforcement learning (RL) algorithms have been instrumental in maximizing long-term customer satisfaction and avoiding short-term, myopic goals in industrial recommender systems.
The goal is to train an RL agent to maximize the purchase reward given a detailed human instruction describing a desired product.
This report also evaluates the RL agents trained using generative trajectories.
arXiv Detail & Related papers (2024-08-28T10:31:50Z) - BPO: Staying Close to the Behavior LLM Creates Better Online LLM Alignment [64.39433316922148]
Direct alignment from preferences (DAP) has emerged as a promising paradigm for aligning large language models (LLMs) to human desiderata from pre-collected, offline preference datasets.
We highlight the need to develop specific online DAP algorithms to fully harness the power of online training.
arXiv Detail & Related papers (2024-06-18T00:41:40Z) - Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models [56.00992369295851]
Open-sourced Large Language Models (LLMs) have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents.
This paper delivers three key observations: (1) the current agent training corpus is entangled with both formats following and agent reasoning, which significantly shifts from the distribution of its pre-training data; (2) LLMs exhibit different learning speeds on the capabilities required by agent tasks; and (3) current approaches have side-effects when improving agent abilities by introducing hallucinations.
We propose Agent-FLAN to effectively Fine-tune LANguage models for Agents.
arXiv Detail & Related papers (2024-03-19T16:26:10Z) - AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning [98.26836657967162]
textbfAgentOhana aggregates agent trajectories from distinct environments, spanning a wide array of scenarios.
textbfxLAM-v0.1, a large action model tailored for AI agents, demonstrates exceptional performance across various benchmarks.
arXiv Detail & Related papers (2024-02-23T18:56:26Z) - Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents [41.14201835950814]
Large language models (LLMs) have achieved success in acting as agents, which interact with environments through tools such as search engines.
Previous work has first collected interaction trajectories between LLMs and environments, using only trajectories that successfully finished the task to fine-tune smaller models.
We argue that unsuccessful trajectories offer valuable insights, and LLMs can learn from these trajectories through appropriate quality control and fine-tuning strategies.
arXiv Detail & Related papers (2024-02-18T17:10:07Z) - How to Train Data-Efficient LLMs [56.41105687693619]
We study data-efficient approaches for pre-training language models (LLMs)
We find that Ask-LLM and Density sampling are the best methods in their respective categories.
In our comparison of 19 samplers, involving hundreds of evaluation tasks and pre-training runs, we find that Ask-LLM and Density are the best methods in their respective categories.
arXiv Detail & Related papers (2024-02-15T02:27:57Z) - DiffClone: Enhanced Behaviour Cloning in Robotics with Diffusion-Driven Policy Learning [1.1242503819703258]
We introduce DiffClone, an offline algorithm of enhanced behaviour cloning agent with diffusion-based policy learning.
This paper is an official submission to the Train-Offline-Test-Online (TOTO) Benchmark Challenge organized at NeurIPS 2023.
arXiv Detail & Related papers (2024-01-17T14:43:59Z) - QUERT: Continual Pre-training of Language Model for Query Understanding
in Travel Domain Search [15.026682829320261]
We propose QUERT, A Continual Pre-trained Language Model for QUERy Understanding in Travel Domain Search.
Quert is jointly trained on four tailored pre-training tasks to the characteristics of query in travel domain search.
To check on the improvement of QUERT to online business, we deploy QUERT and perform A/B testing on Fliggy APP.
arXiv Detail & Related papers (2023-06-11T15:39:59Z) - Multimodal Web Navigation with Instruction-Finetuned Foundation Models [99.14209521903854]
We study data-driven offline training for web agents with vision-language foundation models.
We propose an instruction-following multimodal agent, WebGUM, that observes both webpage screenshots and HTML pages.
We empirically demonstrate this recipe improves the agent's ability of grounded multimodal perception, HTML comprehension, and multi-step reasoning.
arXiv Detail & Related papers (2023-05-19T17:44:34Z) - Generative Conversational Networks [67.13144697969501]
We propose a framework called Generative Conversational Networks, in which conversational agents learn to generate their own labelled training data.
We show an average improvement of 35% in intent detection and 21% in slot tagging over a baseline model trained from the seed data.
arXiv Detail & Related papers (2021-06-15T23:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.