Related papers: Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents

Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents

URL: http://arxiv.org/abs/2503.10689v1
Date: Wed, 12 Mar 2025 01:33:40 GMT
Title: Learning to Contextualize Web Pages for Enhanced Decision Making by LLM Agents
Authors: Dongjun Lee, Juyong Lee, Kyuyoung Kim, Jihoon Tack, Jinwoo Shin, Yee Whye Teh, Kimin Lee,
Abstract summary: We introduce LCoW, a framework for Learning language models to Contextualize complex Web pages into a more comprehensible form.<n>LCoW decouples web page understanding from decision making by training a separate contextualization module.<n>We demonstrate that our contextualization module effectively integrates with LLM agents of various scales to significantly enhance their decision-making capabilities.
Score: 89.98593996816186
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advances in large language models (LLMs) have led to a growing interest in developing LLM-based agents for automating web tasks. However, these agents often struggle with even simple tasks on real-world websites due to their limited capability to understand and process complex web page structures. In this work, we introduce LCoW, a framework for Learning language models to Contextualize complex Web pages into a more comprehensible form, thereby enhancing decision making by LLM agents. LCoW decouples web page understanding from decision making by training a separate contextualization module to transform complex web pages into comprehensible format, which are then utilized by the decision-making agent. We demonstrate that our contextualization module effectively integrates with LLM agents of various scales to significantly enhance their decision-making capabilities in web automation tasks. Notably, LCoW improves the success rates of closed-source LLMs (e.g., Gemini-1.5-flash, GPT-4o, Claude-3.5-Sonnet) by an average of 15.6%, and demonstrates a 23.7% average improvement in success rates for open-source LMs (e.g., Llama-3.1-8B, Llama-3.1-70B) on the WorkArena benchmark. Moreover, the Gemini-1.5-flash agent with LCoW achieves state-of-the-art results on the WebShop benchmark, outperforming human experts. The relevant code materials are available at our project page: https://lcowiclr2025.github.io.

Related papers

ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering [38.14868743029147]
Large language model (LLM)-based agents have significantly advanced the development of autonomous machine learning (ML) engineering.<n>We focus on the paradigm of learning-based agentic ML, where an LLM agent learns through interactive experimentation on ML tasks using online reinforcement learning (RL)<n>We propose a novel agentic ML training framework with three key components: exploration-enriched fine-tuning, step-wise RL, and an agentic ML-specific reward module.<n>Remarkably, despite being trained on merely 9 ML tasks, our 7B-sized ML-Agent outperforms the 671B-sized DeepSeek-R
arXiv Detail & Related papers (2025-05-29T17:54:44Z)
Federated In-Context LLM Agent Learning [3.4757641432843487]
Large Language Models (LLMs) have revolutionized intelligent services by enabling logical reasoning, tool use, and interaction with external systems as agents.<n>In this paper, we propose a novel privacy-preserving Federated In-context LLM Agent Learning (FICAL) algorithm.<n>The results show that FICAL has competitive performance compared to other SOTA baselines with a significant communication cost decrease of $mathbf3.33times105$ times.
arXiv Detail & Related papers (2024-12-11T03:00:24Z)
CHAI for LLMs: Improving Code-Mixed Translation in Large Language Models through Reinforcement Learning with AI Feedback [11.223762031003671]
Large Language Models (LLMs) have demonstrated remarkable capabilities across various NLP tasks but struggle with code-mixed (or code-switched) language understanding. This paper proposes CHAI, a novel framework for improving the ability of multilingual LLMs to handle code-mixed languages. Our analysis shows that CHAI-powered LLMs outperform state-of-the-art open-source LLMs by 25.66% (in terms of win rate adjudicated by human annotators) in code-mixed translation tasks.
arXiv Detail & Related papers (2024-11-13T22:56:00Z)
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning [30.42084844801606]
Large language models (LLMs) have shown remarkable potential as autonomous agents, particularly in web-based tasks.<n>This paper introduces WebRL, a self-evolving online curriculum reinforcement learning framework designed to train high-performance web agents using open LLMs.<n>We apply WebRL to transform open Llama-3.1 and GLM-4 models into proficient web agents.
arXiv Detail & Related papers (2024-11-04T17:59:58Z)
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents [52.13695464678006]
This study enhances an LLM-based web agent by simply refining its observation and action space. AgentOccam surpasses the previous state-of-the-art and concurrent work by 9.8 (+29.4%) and 5.9 (+15.8%) absolute points respectively.
arXiv Detail & Related papers (2024-10-17T17:50:38Z)
SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization [8.121663525764294]
Large language models (LLMs) play a crucial role in our daily lives due to their ability to understand and generate human-like text. In this report, we design a collaborative inference architecture between a server and its clients to alleviate the throughput limit. We show in the experiments that we are able to efficiently distribute the workload allowing for roughly 1/3 reduction in the server workload.
arXiv Detail & Related papers (2024-10-14T17:38:41Z)
Teaching Machines to Code: Smart Contract Translation with LLMs [4.780973517287942]
We present a pioneering approach, which harnesses the synergy of two distinct large language models (LLMs) within a unified framework. This framework is designed to grasp coding principles and apply this understanding to the translation of code into an unfamiliar language. Our study delves into the capacity of LLMs to mimic human learning processes, offering an in-depth evaluation of our methodology for converting smart contracts written in Solidity to Move.
arXiv Detail & Related papers (2024-03-13T18:55:20Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration [98.18244218156492]
Large Language Models (LLMs) have significantly advanced natural language processing. As their applications expand into multi-agent environments, there arises a need for a comprehensive evaluation framework. This work introduces a novel competition-based benchmark framework to assess LLMs within multi-agent settings.
arXiv Detail & Related papers (2023-11-14T21:46:27Z)
Agent Lumos: Unified and Modular Training for Open-Source Language Agents [89.78556964988852]
We introduce LUMOS, one of the first frameworks for training open-source LLM-based agents. LUMOS features a learnable, unified, and modular architecture with a planning module that learns high-level subgoal generation. We collect large-scale, unified, and high-quality training annotations derived from diverse ground-truth reasoning rationales.
arXiv Detail & Related papers (2023-11-09T00:30:13Z)
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis [69.15016747150868]
We introduce WebAgent, an agent that learns from self-experience to complete tasks on real websites. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites. We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks.
arXiv Detail & Related papers (2023-07-24T14:56:30Z)
The Web Can Be Your Oyster for Improving Large Language Models [98.72358969495835]
Large language models (LLMs) encode a large amount of world knowledge. We consider augmenting LLMs with the large-scale web using search engine. We present a web-augmented LLM UNIWEB, which is trained over 16 knowledge-intensive tasks in a unified text-to-text format.
arXiv Detail & Related papers (2023-05-18T14:20:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.