Policy Learning with a Language Bottleneck
- URL: http://arxiv.org/abs/2405.04118v1
- Date: Tue, 7 May 2024 08:40:21 GMT
- Title: Policy Learning with a Language Bottleneck
- Authors: Megha Srivastava, Cedric Colas, Dorsa Sadigh, Jacob Andreas,
- Abstract summary: Policy Learning with a Language Bottleneck (PLLBB) is a framework enabling AI agents to generate linguistic rules.
PLLBB alternates between a rule generation step guided by language models, and an update step where agents learn new policies guided by rules.
In a two-player communication game, a maze solving task, and two image reconstruction tasks, we show thatPLLBB agents are not only able to learn more interpretable and generalizable behaviors, but can also share the learned rules with human users.
- Score: 65.99843627646018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern AI systems such as self-driving cars and game-playing agents achieve superhuman performance, but often lack human-like features such as generalization, interpretability and human inter-operability. Inspired by the rich interactions between language and decision-making in humans, we introduce Policy Learning with a Language Bottleneck (PLLB), a framework enabling AI agents to generate linguistic rules that capture the strategies underlying their most rewarding behaviors. PLLB alternates between a rule generation step guided by language models, and an update step where agents learn new policies guided by rules. In a two-player communication game, a maze solving task, and two image reconstruction tasks, we show that PLLB agents are not only able to learn more interpretable and generalizable behaviors, but can also share the learned rules with human users, enabling more effective human-AI coordination.
Related papers
- Interpretable Robotic Manipulation from Language [11.207620790833271]
We introduce an explainable behavior cloning agent, named Ex-PERACT, specifically designed for manipulation tasks.
At the top level, the model is tasked with learning a discrete skill code, while at the bottom level, the policy network translates the problem into a voxelized grid and maps the discretized actions to voxel grids.
We evaluate our method across eight challenging manipulation tasks utilizing the RLBench benchmark, demonstrating that Ex-PERACT not only achieves competitive policy performance but also effectively bridges the gap between human instructions and machine execution in complex environments.
arXiv Detail & Related papers (2024-05-27T11:02:21Z) - AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents [58.807802111818994]
We propose AnySkill, a novel hierarchical method that learns physically plausible interactions following open-vocabulary instructions.
Our approach begins by developing a set of atomic actions via a low-level controller trained via imitation learning.
An important feature of our method is the use of image-based rewards for the high-level policy, which allows the agent to learn interactions with objects without manual reward engineering.
arXiv Detail & Related papers (2024-03-19T15:41:39Z) - Progressively Efficient Learning [58.6490456517954]
We develop a novel learning framework named Communication-Efficient Interactive Learning (CEIL)
CEIL leads to emergence of a human-like pattern where the learner and the teacher communicate efficiently by exchanging increasingly more abstract intentions.
Agents trained with CEIL quickly master new tasks, outperforming non-hierarchical and hierarchical imitation learning by up to 50% and 20% in absolute success rate.
arXiv Detail & Related papers (2023-10-13T07:52:04Z) - "No, to the Right" -- Online Language Corrections for Robotic
Manipulation via Shared Autonomy [70.45420918526926]
We present LILAC, a framework for incorporating and adapting to natural language corrections online during execution.
Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot.
We show that our corrections-aware approach obtains higher task completion rates, and is subjectively preferred by users.
arXiv Detail & Related papers (2023-01-06T15:03:27Z) - LISA: Learning Interpretable Skill Abstractions from Language [85.20587800593293]
We propose a hierarchical imitation learning framework that can learn diverse, interpretable skills from language-conditioned demonstrations.
Our method demonstrates a more natural way to condition on language in sequential decision-making problems.
arXiv Detail & Related papers (2022-02-28T19:43:24Z) - Natural Language Specification of Reinforcement Learning Policies
through Differentiable Decision Trees [10.406631494442683]
Human-AI policy specification is a novel procedure we define in which humans can collaboratively warm-start a robot's reinforcement learning policy.
We develop a novel collaborative framework to allow humans to initialize and interpret an autonomous agent's behavior.
Our approach warm-starts an RL agent by utilizing non-expert natural language specifications without incurring the additional domain exploration costs.
arXiv Detail & Related papers (2021-01-18T16:07:00Z) - Language-Conditioned Imitation Learning for Robot Manipulation Tasks [39.40937105264774]
We introduce a method for incorporating unstructured natural language into imitation learning.
At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent.
The training process then interrelates these two modalities to encode the correlations between language, perception, and motion.
The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions.
arXiv Detail & Related papers (2020-10-22T21:49:08Z) - On the interaction between supervision and self-play in emergent
communication [82.290338507106]
We investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency.
We find that first training agents via supervised learning on human data followed by self-play outperforms the converse.
arXiv Detail & Related papers (2020-02-04T02:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.