Related papers: WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

URL: http://arxiv.org/abs/2411.02337v3
Date: Mon, 27 Jan 2025 11:56:15 GMT
Title: WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Authors: Zehan Qi, Xiao Liu, Iat Long Iong, Hanyu Lai, Xueqiao Sun, Wenyi Zhao, Yu Yang, Xinyue Yang, Jiadai Sun, Shuntian Yao, Tianjie Zhang, Wei Xu, Jie Tang, Yuxiao Dong,
Abstract summary: Large language models (LLMs) have shown remarkable potential as autonomous agents, particularly in web-based tasks.<n>This paper introduces WebRL, a self-evolving online curriculum reinforcement learning framework designed to train high-performance web agents using open LLMs.<n>We apply WebRL to transform open Llama-3.1 and GLM-4 models into proficient web agents.
Score: 30.42084844801606
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language models (LLMs) have shown remarkable potential as autonomous agents, particularly in web-based tasks. However, existing LLM web agents heavily rely on expensive proprietary LLM APIs, while open LLMs lack the necessary decision-making capabilities. This paper introduces WebRL, a self-evolving online curriculum reinforcement learning framework designed to train high-performance web agents using open LLMs. WebRL addresses three key challenges in building LLM web agents, including the scarcity of training tasks, sparse feedback signals, and policy distribution drift in online learning. Specifically, WebRL incorporates 1) a self-evolving curriculum that generates new tasks from unsuccessful attempts, 2) a robust outcome-supervised reward model (ORM), and 3) adaptive reinforcement learning strategies to ensure consistent improvements. We apply WebRL to transform open Llama-3.1 and GLM-4 models into proficient web agents. On WebArena-Lite, WebRL improves the success rate of Llama-3.1-8B from 4.8% to 42.4%, and from 6.1% to 43% for GLM-4-9B. These open models significantly surpass the performance of GPT-4-Turbo (17.6%) and GPT-4o (13.9%) and outperform previous state-of-the-art web agents trained on open LLMs (AutoWebGLM, 18.2%). Our findings demonstrate WebRL's effectiveness in bridging the gap between open and proprietary LLM-based web agents, paving the way for more accessible and powerful autonomous web interaction systems.

Related papers

Controlling Large Language Model with Latent Actions [27.0292050543406]
Adapting Large Language Models to downstream tasks using Reinforcement Learning has proven to be an effective approach. This paper studies learning a compact latent action space to enhance the controllability and exploration of RL for LLMs. We propose Controlling Large Language Models with Latent Actions (CoLA), a framework that integrates a latent action space into pre-trained LLMs.
arXiv Detail & Related papers (2025-03-27T11:25:22Z)
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks [110.20297293596005]
Large language model (LLM) agents need to perform multi-turn interactions in real-world tasks. Existing multi-turn RL algorithms for optimizing LLM agents fail to perform effective credit assignment over multiple turns while leveraging the generalization capabilities of LLMs. We propose a novel RL algorithm, SWEET-RL, that uses a carefully designed optimization objective to train a critic model with access to additional training-time information. Our experiments demonstrate that SWEET-RL achieves a 6% absolute improvement in success and win rates on ColBench compared to other state-of-the-art multi-turn RL algorithms.
arXiv Detail & Related papers (2025-03-19T17:55:08Z)
Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents [89.98593996816186]
We introduce LCoW, a framework for Learning language models to Contextualize complex Web pages into a more comprehensible form. LCoW decouples web page understanding from decision making by training a separate contextualization module. We demonstrate that our contextualization module effectively integrates with LLM agents of various scales to significantly enhance their decision-making capabilities.
arXiv Detail & Related papers (2025-03-12T01:33:40Z)
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution [46.5893728376551]
This paper introduces SWE-RL, the first approach to scale RL-based large language models (LLMs) for real-world software engineering. Llama3-SWE-RL-70B achieves a 41.0% solve rate on SWE-bench Verified -- a human-verified collection of real-world GitHub issues. Surprisingly, despite performing RL solely on software evolution data, Llama3-SWE-RL has even emerged with generalized reasoning skills.
arXiv Detail & Related papers (2025-02-25T18:45:04Z)
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space. AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z)
Robust RL with LLM-Driven Data Synthesis and Policy Adaptation for Autonomous Driving [41.87011820577736]
This paper introduces RAPID, a novel framework for training mix-of-policy Reinforcement Learning agents. It trains specialized mix-of-policy RL agents using data synthesized by an LLM-based driving agent and online adaptation. It reduces the robustness of LLM knowledge while maintaining adaptability to different tasks.
arXiv Detail & Related papers (2024-10-16T13:43:00Z)
SAC-GLAM: Improving Online RL for LLM agents with Soft Actor-Critic and Hindsight Relabeling [29.29604779151457]
This paper presents and studies an adaptation of Soft Actor-Critic and hindsight relabeling to LLM agents. Our method paves the path towards autotelic LLM agents that learn online but can also outperform on-policy methods in more classic multi-goal RL environments.
arXiv Detail & Related papers (2024-10-16T11:59:27Z)
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning [61.10299147201369]
This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents. We build a scalable and parallelizable Android learning environment equipped with a VLM-based evaluator. We demonstrate the effectiveness of DigiRL using the Android-in-the-Wild dataset, where our 1.3B VLM trained with RL achieves a 49.5% absolute improvement.
arXiv Detail & Related papers (2024-06-14T17:49:55Z)
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement [79.31084387589968]
Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks. We propose LLM2LLM, a data augmentation strategy that uses a teacher LLM to enhance a small seed dataset. We achieve improvements up to 24.2% on the GSM8K dataset, 32.6% on CaseHOLD, 32.0% on SNIPS, 52.6% on TREC and 39.8% on SST-2 over regular fine-tuning in the low-data regime.
arXiv Detail & Related papers (2024-03-22T08:57:07Z)
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models [56.00992369295851]
Open-sourced Large Language Models (LLMs) have achieved great success in various NLP tasks, however, they are still far inferior to API-based models when acting as agents. This paper delivers three key observations: (1) the current agent training corpus is entangled with both formats following and agent reasoning, which significantly shifts from the distribution of its pre-training data; (2) LLMs exhibit different learning speeds on the capabilities required by agent tasks; and (3) current approaches have side-effects when improving agent abilities by introducing hallucinations. We propose Agent-FLAN to effectively Fine-tune LANguage models for Agents.
arXiv Detail & Related papers (2024-03-19T16:26:10Z)
EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents [65.38474102119181]
We propose EnvGen, a framework to adaptively create training environments. We train a small RL agent in a mixture of the original and LLM-generated environments. We find that a small RL agent trained with EnvGen can outperform SOTA methods, including a GPT-4 agent, and learns long-horizon tasks significantly faster.
arXiv Detail & Related papers (2024-03-18T17:51:16Z)
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models. Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel. Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z)
ARL2: Aligning Retrievers for Black-box Large Language Models via Self-guided Adaptive Relevance Labeling [20.022332182475672]
ARL2 is a retriever learning technique that harnesses large language models as labelers. ARL2 uses an adaptive self-training strategy for curating high-quality and diverse relevance data. Experiments demonstrate the effectiveness of ARL2, achieving accuracy improvements of 5.4% on NQ and 4.6% on MMLU.
arXiv Detail & Related papers (2024-02-21T05:41:34Z)
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models [2.5721733711031978]
We review research studies that combine Reinforcement Learning (RL) and Large Language Models (LLMs) We propose a novel taxonomy of three main classes based on the way that the two model types interact with each other.
arXiv Detail & Related papers (2024-02-02T20:01:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.