Policy Learning with a Language Bottleneck
- URL: http://arxiv.org/abs/2405.04118v2
- Date: Wed, 26 Mar 2025 20:53:57 GMT
- Title: Policy Learning with a Language Bottleneck
- Authors: Megha Srivastava, Cedric Colas, Dorsa Sadigh, Jacob Andreas,
- Abstract summary: We introduce Policy Learning with a Language Bottleneck (PLLB), a framework enabling AI agents to generate linguistic rules.<n>PLLBB alternates between a *rule generation* step guided by language models, and an *update* step where agents learn new policies guided by rules.<n>We show thatPLLB agents are able to learn more interpretable and generalizable behaviors, but can also share the learned rules with human users.
- Score: 65.99843627646018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern AI systems such as self-driving cars and game-playing agents achieve superhuman performance, but often lack human-like generalization, interpretability, and inter-operability with human users. Inspired by the rich interactions between language and decision-making in humans, we introduce Policy Learning with a Language Bottleneck (PLLB), a framework enabling AI agents to generate linguistic rules that capture the high-level strategies underlying rewarding behaviors. PLLB alternates between a *rule generation* step guided by language models, and an *update* step where agents learn new policies guided by rules, even when a rule is insufficient to describe an entire complex policy. Across five diverse tasks, including a two-player signaling game, maze navigation, image reconstruction, and robot grasp planning, we show that PLLB agents are not only able to learn more interpretable and generalizable behaviors, but can also share the learned rules with human users, enabling more effective human-AI coordination. We provide source code for our experiments at https://github.com/meghabyte/bottleneck .
Related papers
- STEER: Flexible Robotic Manipulation via Dense Language Grounding [16.97343810491996]
STEER is a robot learning framework that bridges high-level, commonsense reasoning with precise, flexible low-level control.
Our approach translates complex situational awareness into actionable low-level behavior through training language-grounded policies with dense annotation.
arXiv Detail & Related papers (2024-11-05T18:48:12Z) - Interpretable Robotic Manipulation from Language [11.207620790833271]
We introduce an explainable behavior cloning agent, named Ex-PERACT, specifically designed for manipulation tasks.
At the top level, the model is tasked with learning a discrete skill code, while at the bottom level, the policy network translates the problem into a voxelized grid and maps the discretized actions to voxel grids.
We evaluate our method across eight challenging manipulation tasks utilizing the RLBench benchmark, demonstrating that Ex-PERACT not only achieves competitive policy performance but also effectively bridges the gap between human instructions and machine execution in complex environments.
arXiv Detail & Related papers (2024-05-27T11:02:21Z) - AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents [58.807802111818994]
We propose AnySkill, a novel hierarchical method that learns physically plausible interactions following open-vocabulary instructions.
Our approach begins by developing a set of atomic actions via a low-level controller trained via imitation learning.
An important feature of our method is the use of image-based rewards for the high-level policy, which allows the agent to learn interactions with objects without manual reward engineering.
arXiv Detail & Related papers (2024-03-19T15:41:39Z) - Progressively Efficient Learning [58.6490456517954]
We develop a novel learning framework named Communication-Efficient Interactive Learning (CEIL)
CEIL leads to emergence of a human-like pattern where the learner and the teacher communicate efficiently by exchanging increasingly more abstract intentions.
Agents trained with CEIL quickly master new tasks, outperforming non-hierarchical and hierarchical imitation learning by up to 50% and 20% in absolute success rate.
arXiv Detail & Related papers (2023-10-13T07:52:04Z) - Learning to Model the World with Language [100.76069091703505]
To interact with humans and act in the world, agents need to understand the range of language that people use and relate it to the visual world.
Our key idea is that agents should interpret such diverse language as a signal that helps them predict the future.
We instantiate this in Dynalang, an agent that learns a multimodal world model to predict future text and image representations.
arXiv Detail & Related papers (2023-07-31T17:57:49Z) - "No, to the Right" -- Online Language Corrections for Robotic
Manipulation via Shared Autonomy [70.45420918526926]
We present LILAC, a framework for incorporating and adapting to natural language corrections online during execution.
Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot.
We show that our corrections-aware approach obtains higher task completion rates, and is subjectively preferred by users.
arXiv Detail & Related papers (2023-01-06T15:03:27Z) - LISA: Learning Interpretable Skill Abstractions from Language [85.20587800593293]
We propose a hierarchical imitation learning framework that can learn diverse, interpretable skills from language-conditioned demonstrations.
Our method demonstrates a more natural way to condition on language in sequential decision-making problems.
arXiv Detail & Related papers (2022-02-28T19:43:24Z) - Natural Language Specification of Reinforcement Learning Policies
through Differentiable Decision Trees [10.406631494442683]
Human-AI policy specification is a novel procedure we define in which humans can collaboratively warm-start a robot's reinforcement learning policy.
We develop a novel collaborative framework to allow humans to initialize and interpret an autonomous agent's behavior.
Our approach warm-starts an RL agent by utilizing non-expert natural language specifications without incurring the additional domain exploration costs.
arXiv Detail & Related papers (2021-01-18T16:07:00Z) - Language-Conditioned Imitation Learning for Robot Manipulation Tasks [39.40937105264774]
We introduce a method for incorporating unstructured natural language into imitation learning.
At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent.
The training process then interrelates these two modalities to encode the correlations between language, perception, and motion.
The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions.
arXiv Detail & Related papers (2020-10-22T21:49:08Z) - Human Instruction-Following with Deep Reinforcement Learning via
Transfer-Learning from Text [12.88819706338837]
Recent work has described neural-network-based agents that are trained with reinforcement learning to execute language-like commands in simulated worlds.
We propose a conceptually simple method for training instruction-following agents with deep RL that are robust to natural human instructions.
arXiv Detail & Related papers (2020-05-19T12:16:58Z) - On the interaction between supervision and self-play in emergent
communication [82.290338507106]
We investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency.
We find that first training agents via supervised learning on human data followed by self-play outperforms the converse.
arXiv Detail & Related papers (2020-02-04T02:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.