Language Models can Solve Computer Tasks
- URL: http://arxiv.org/abs/2303.17491v3
- Date: Thu, 16 Nov 2023 20:15:14 GMT
- Title: Language Models can Solve Computer Tasks
- Authors: Geunwoo Kim, Pierre Baldi, Stephen McAleer
- Abstract summary: We show that a pre-trained large language model (LLM) agent can execute computer tasks guided by natural language using a simple prompting scheme.
We compare multiple LLMs and find that RCI with the InstructGPT-3+RLHF LLM is state-of-the-art on MiniWoB++.
- Score: 13.914130729517584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Agents capable of carrying out general tasks on a computer can improve
efficiency and productivity by automating repetitive tasks and assisting in
complex problem-solving. Ideally, such agents should be able to solve new
computer tasks presented to them through natural language commands. However,
previous approaches to this problem require large amounts of expert
demonstrations and task-specific reward functions, both of which are
impractical for new tasks. In this work, we show that a pre-trained large
language model (LLM) agent can execute computer tasks guided by natural
language using a simple prompting scheme where the agent Recursively Criticizes
and Improves its output (RCI). The RCI approach significantly outperforms
existing LLM methods for automating computer tasks and surpasses supervised
learning (SL) and reinforcement learning (RL) approaches on the MiniWoB++
benchmark. We compare multiple LLMs and find that RCI with the
InstructGPT-3+RLHF LLM is state-of-the-art on MiniWoB++, using only a handful
of demonstrations per task rather than tens of thousands, and without a
task-specific reward function. Furthermore, we demonstrate RCI prompting's
effectiveness in enhancing LLMs' reasoning abilities on a suite of natural
language reasoning tasks, outperforming chain of thought (CoT) prompting with
external feedback. We find that RCI combined with CoT performs better than
either separately. Our code can be found here:
https://github.com/posgnu/rci-agent.
Related papers
- Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks [0.8425561594225592]
This study introduces a novel framework for training smaller language models in function calling.
It focuses on specific logical and mathematical reasoning tasks.
The approach aims to improve performances of small-scale models for these tasks using function calling.
arXiv Detail & Related papers (2024-10-24T16:27:35Z) - Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards [49.7719149179179]
This paper investigates the feasibility of using PPO for reinforcement learning (RL) from explicitly programmed reward signals.
We focus on tasks expressed through formal languages, such as programming, where explicit reward functions can be programmed to automatically assess quality of generated outputs.
Our results show that pure RL-based training for the two formal language tasks is challenging, with success being limited even for the simple arithmetic task.
arXiv Detail & Related papers (2024-10-22T15:59:58Z) - CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models [19.73329768987112]
CurricuLLM is a curriculum learning tool for complex robot control tasks.
It generates subtasks that aid target task learning in natural language form.
It also translates natural language description of subtasks into executable code.
CurricuLLM can aid learning complex robot control tasks.
arXiv Detail & Related papers (2024-09-27T01:48:16Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs)
Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z) - Large Language Models as Generalizable Policies for Embodied Tasks [50.870491905776305]
We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks.
Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment.
arXiv Detail & Related papers (2023-10-26T18:32:05Z) - A Zero-Shot Language Agent for Computer Control with Structured
Reflection [19.526676887048662]
Large language models (LLMs) have shown increasing capacity at planning and executing a high-level goal in a live computer environment.
To perform a task, recent works often require a model to learn from trace examples of the task via either supervised learning or few/many-shot prompting.
We approach this problem with a zero-shot agent that requires no given expert traces.
arXiv Detail & Related papers (2023-10-12T21:53:37Z) - LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient
Querying [71.86163159193327]
Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text.
This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion.
We introduce LaGR, which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent.
arXiv Detail & Related papers (2023-08-21T02:07:35Z) - Collaborating with language models for embodied reasoning [30.82976922056617]
Reasoning in a complex and ambiguous environment is a key goal for Reinforcement Learning (RL) agents.
We present a set of tasks that require reasoning, test this system's ability to generalize zero-shot and investigate failure cases.
arXiv Detail & Related papers (2023-02-01T21:26:32Z) - Decomposed Prompting: A Modular Approach for Solving Complex Tasks [55.42850359286304]
We propose Decomposed Prompting to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks.
This modular structure allows each prompt to be optimized for its specific sub-task.
We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting.
arXiv Detail & Related papers (2022-10-05T17:28:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.