AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation
- URL: http://arxiv.org/abs/2404.12753v1
- Date: Fri, 19 Apr 2024 09:59:44 GMT
- Title: AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation
- Authors: Wenhao Huang, Chenghao Peng, Zhixu Li, Jiaqing Liang, Yanghua Xiao, Liqian Wen, Zulong Chen,
- Abstract summary: We introduce a crawler generation task for vertical information web pages.
We propose AutoCrawler, a two-stage framework that leverages the hierarchical structure of HTML for progressive understanding.
- Score: 55.86438100985539
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Web automation is a significant technique that accomplishes complicated web tasks by automating common web actions, enhancing operational efficiency, and reducing the need for manual intervention. Traditional methods, such as wrappers, suffer from limited adaptability and scalability when faced with a new website. On the other hand, generative agents empowered by large language models (LLMs) exhibit poor performance and reusability in open-world scenarios. In this work, we introduce a crawler generation task for vertical information web pages and the paradigm of combining LLMs with crawlers, which helps crawlers handle diverse and changing web environments more efficiently. We propose AutoCrawler, a two-stage framework that leverages the hierarchical structure of HTML for progressive understanding. Through top-down and step-back operations, AutoCrawler can learn from erroneous actions and continuously prune HTML for better action generation. We conduct comprehensive experiments with multiple LLMs and demonstrate the effectiveness of our framework. Resources of this paper can be found at \url{https://github.com/EZ-hwh/AutoCrawler}
Related papers
- ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning [74.58666091522198]
We present a framework for intuitive robot programming by non-experts.
We leverage natural language prompts and contextual information from the Robot Operating System (ROS)
Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface.
arXiv Detail & Related papers (2024-06-28T08:28:38Z) - CAAP: Context-Aware Action Planning Prompting to Solve Computer Tasks with Front-End UI Only [21.054681757006385]
Large Language Models (LLMs) with advanced reasoning capabilities have set the stage for agents to undertake more complex and previously unseen tasks.
We propose an agent that functions solely on the basis of screenshots for recognizing environments.
We achieve a success rate of 94.4% on 67types of MiniWoB++ problems, utilizing only 1.48demonstrations per problem type.
arXiv Detail & Related papers (2024-06-11T05:21:20Z) - AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent [33.55199326570078]
AutoWebGLM is an automated web navigation agent built upon ChatGLM3-6B.
Inspired by human browsing patterns, we design an HTML simplification algorithm to represent webpages.
For testing, we establish a bilingual benchmark -- AutoWebBench -- for real-world web browsing tasks.
arXiv Detail & Related papers (2024-04-04T17:58:40Z) - On the Multi-turn Instruction Following for Conversational Web Agents [83.51251174629084]
We introduce a new task of Conversational Web Navigation, which necessitates sophisticated interactions that span multiple turns with both the users and the environment.
We propose a novel framework, named self-reflective memory-augmented planning (Self-MAP), which employs memory utilization and self-reflection techniques.
arXiv Detail & Related papers (2024-02-23T02:18:12Z) - AllTogether: Investigating the Efficacy of Spliced Prompt for Web
Navigation using Large Language Models [2.234037966956278]
We introduce AllTogether, a standardized prompt template that enhances task context representation.
We evaluate the efficacy of this approach through prompt learning and instruction finetuning based on open-source Llama-2 and API-accessible GPT models.
arXiv Detail & Related papers (2023-10-20T11:10:14Z) - LASER: LLM Agent with State-Space Exploration for Web Navigation [57.802977310392755]
Large language models (LLMs) have been successfully adapted for interactive decision-making tasks like web navigation.
Previous methods implicitly assume a forward-only execution mode for the model, where they only provide oracle trajectories as in-context examples.
We propose to model the interactive task as state space exploration, where the LLM agent transitions among a pre-defined set of states by performing actions to complete the task.
arXiv Detail & Related papers (2023-09-15T05:44:08Z) - A Real-World WebAgent with Planning, Long Context Understanding, and
Program Synthesis [69.15016747150868]
We introduce WebAgent, an agent that learns from self-experience to complete tasks on real websites.
WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites.
We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks.
arXiv Detail & Related papers (2023-07-24T14:56:30Z) - AutoML-GPT: Automatic Machine Learning with GPT [74.30699827690596]
We propose developing task-oriented prompts and automatically utilizing large language models (LLMs) to automate the training pipeline.
We present the AutoML-GPT, which employs GPT as the bridge to diverse AI models and dynamically trains models with optimized hyper parameters.
This approach achieves remarkable results in computer vision, natural language processing, and other challenging areas.
arXiv Detail & Related papers (2023-05-04T02:09:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.