Related papers: Collaborating with language models for embodied reasoning

Collaborating with language models for embodied reasoning

URL: http://arxiv.org/abs/2302.00763v1
Date: Wed, 1 Feb 2023 21:26:32 GMT
Title: Collaborating with language models for embodied reasoning
Authors: Ishita Dasgupta, Christine Kaeser-Chen, Kenneth Marino, Arun Ahuja, Sheila Babayan, Felix Hill, Rob Fergus
Abstract summary: Reasoning in a complex and ambiguous environment is a key goal for Reinforcement Learning (RL) agents. We present a set of tasks that require reasoning, test this system's ability to generalize zero-shot and investigate failure cases.
Score: 30.82976922056617
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reasoning in a complex and ambiguous environment is a key goal for Reinforcement Learning (RL) agents. While some sophisticated RL agents can successfully solve difficult tasks, they require a large amount of training data and often struggle to generalize to new unseen environments and new tasks. On the other hand, Large Scale Language Models (LSLMs) have exhibited strong reasoning ability and the ability to to adapt to new tasks through in-context learning. However, LSLMs do not inherently have the ability to interrogate or intervene on the environment. In this work, we investigate how to combine these complementary abilities in a single system consisting of three parts: a Planner, an Actor, and a Reporter. The Planner is a pre-trained language model that can issue commands to a simple embodied agent (the Actor), while the Reporter communicates with the Planner to inform its next command. We present a set of tasks that require reasoning, test this system's ability to generalize zero-shot and investigate failure cases, and demonstrate how components of this system can be trained with reinforcement-learning to improve performance.

Related papers

Operationalising the Superficial Alignment Hypothesis via Task Complexity [49.93635747700126]
We propose a new metric called task complexity: the length of the shortest program that achieves a target performance on a task.<n>Our results highlight that task adaptation often requires surprisingly little information -- often just a few kilobytes.
arXiv Detail & Related papers (2026-02-17T18:59:39Z)
HELP: Hierarchical Embodied Language Planner for Household Tasks [75.38606213726906]
Embodied agents tasked with complex scenarios rely heavily on robust planning capabilities.<n>Large language models equipped with extensive linguistic knowledge can play this role.<n>We propose a Hierarchical Embodied Language Planner, called HELP, consisting of a set of LLM-based agents.
arXiv Detail & Related papers (2025-12-25T15:54:08Z)
PARL: Prompt-based Agents for Reinforcement Learning [8.465228064780742]
Large language models (LLMs) have demonstrated high performance on tasks expressed in natural language.<n>We study structured, non-linguistic reasoning - such as interpreting positions in a grid world.<n>We introduce PARL (Prompt-based Agent for Reinforcement Learning), a method that uses LLMs as RL agents through prompting.
arXiv Detail & Related papers (2025-10-24T10:04:23Z)
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs [54.59207567677249]
Large language models (LLMs) still struggle across tasks outside of high-resource languages.<n>In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific post-training data is scarce.
arXiv Detail & Related papers (2025-05-23T20:28:31Z)
Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL [62.984693936073974]
Large language models (LLMs) excel in tasks like question answering and dialogue.<n>Complex tasks requiring interaction, such as negotiation and persuasion, require additional long-horizon reasoning and planning.<n>We propose a novel approach that uses goal-conditioned value functions to guide the reasoning of LLM agents.
arXiv Detail & Related papers (2025-05-23T16:51:54Z)
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks [0.8425561594225592]
This study introduces a novel framework for training smaller language models in function calling. It focuses on specific logical and mathematical reasoning tasks. The approach aims to improve performances of small-scale models for these tasks using function calling.
arXiv Detail & Related papers (2024-10-24T16:27:35Z)
Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards [49.7719149179179]
This paper investigates the feasibility of using PPO for reinforcement learning (RL) from explicitly programmed reward signals. We focus on tasks expressed through formal languages, such as programming, where explicit reward functions can be programmed to automatically assess quality of generated outputs. Our results show that pure RL-based training for the two formal language tasks is challenging, with success being limited even for the simple arithmetic task.
arXiv Detail & Related papers (2024-10-22T15:59:58Z)
Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability [12.349247962800813]
Large language models (LLMs) have emerged as powerful tools for many AI problems. They exhibit remarkable in-context learning (ICL) capabilities. How they approach composite tasks remains an open and largely underexplored question.
arXiv Detail & Related papers (2024-07-22T15:22:34Z)
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z)
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs) Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z)
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models [68.18370230899102]
We investigate how to elicit compositional generalization capabilities in large language models (LLMs) We find that demonstrating both foundational skills and compositional examples grounded in these skills within the same prompt context is crucial. We show that fine-tuning LLMs with SKiC-style data can elicit zero-shot weak-to-strong generalization.
arXiv Detail & Related papers (2023-08-01T05:54:12Z)
Coupling Large Language Models with Logic Programming for Robust and General Reasoning from Text [5.532477732693001]
We show that a large language model can serve as a highly effective few-shot semantically. It can convert natural language sentences into a logical form that serves as input for answer set programs. We demonstrate that this method achieves state-of-the-art performance on several benchmarks, including bAbI, StepGame, CLUTRR, and gSCAN.
arXiv Detail & Related papers (2023-07-15T03:29:59Z)
Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents [99.17668730578586]
Pre-trained large language models (LLMs) capture procedural knowledge about the world. Plan, Eliminate, and Track (PET) framework translates a task description into a list of high-level sub-tasks. PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.
arXiv Detail & Related papers (2023-05-03T20:11:22Z)
Multitasking Inhibits Semantic Drift [46.71462510028727]
We study the dynamics of learning in latent language policies (LLPs) LLPs can solve challenging long-horizon reinforcement learning problems. Previous work has found that LLP training is prone to semantic drift.
arXiv Detail & Related papers (2021-04-15T03:42:17Z)
CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning [138.40338621974954]
CausalWorld is a benchmark for causal structure and transfer learning in a robotic manipulation environment. Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures.
arXiv Detail & Related papers (2020-10-08T23:01:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.