Natural Language Specification of Reinforcement Learning Policies
through Differentiable Decision Trees
- URL: http://arxiv.org/abs/2101.07140v4
- Date: Sat, 20 May 2023 21:13:00 GMT
- Title: Natural Language Specification of Reinforcement Learning Policies
through Differentiable Decision Trees
- Authors: Pradyumna Tambwekar, Andrew Silva, Nakul Gopalan, Matthew Gombolay
- Abstract summary: Human-AI policy specification is a novel procedure we define in which humans can collaboratively warm-start a robot's reinforcement learning policy.
We develop a novel collaborative framework to allow humans to initialize and interpret an autonomous agent's behavior.
Our approach warm-starts an RL agent by utilizing non-expert natural language specifications without incurring the additional domain exploration costs.
- Score: 10.406631494442683
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-AI policy specification is a novel procedure we define in which humans
can collaboratively warm-start a robot's reinforcement learning policy. This
procedure is comprised of two steps; (1) Policy Specification, i.e. humans
specifying the behavior they would like their companion robot to accomplish,
and (2) Policy Optimization, i.e. the robot applying reinforcement learning to
improve the initial policy. Existing approaches to enabling collaborative
policy specification are often unintelligible black-box methods, and are not
catered towards making the autonomous system accessible to a novice end-user.
In this paper, we develop a novel collaborative framework to allow humans to
initialize and interpret an autonomous agent's behavior. Through our framework,
we enable humans to specify an initial behavior model via unstructured, natural
language (NL), which we convert to lexical decision trees. Next, we leverage
these translated specifications, to warm-start reinforcement learning and allow
the agent to further optimize these potentially suboptimal policies. Our
approach warm-starts an RL agent by utilizing non-expert natural language
specifications without incurring the additional domain exploration costs. We
validate our approach by showing that our model is able to produce >80%
translation accuracy, and that policies initialized by a human can match the
performance of relevant RL baselines in two domains.
Related papers
- Policy Learning with a Language Bottleneck [65.99843627646018]
Policy Learning with a Language Bottleneck (PLLBB) is a framework enabling AI agents to generate linguistic rules.
PLLBB alternates between a rule generation step guided by language models, and an update step where agents learn new policies guided by rules.
In a two-player communication game, a maze solving task, and two image reconstruction tasks, we show thatPLLBB agents are not only able to learn more interpretable and generalizable behaviors, but can also share the learned rules with human users.
arXiv Detail & Related papers (2024-05-07T08:40:21Z) - DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation [57.07295906718989]
Constrained decoding approaches aim to control the meaning or style of text generated by a Pre-trained Language Model (PLM) using specific target words during inference.
We propose a novel decoding framework, DECIDER, which enables us to program rules on how we complete tasks to control a PLM.
arXiv Detail & Related papers (2024-03-04T11:49:08Z) - "No, to the Right" -- Online Language Corrections for Robotic
Manipulation via Shared Autonomy [70.45420918526926]
We present LILAC, a framework for incorporating and adapting to natural language corrections online during execution.
Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot.
We show that our corrections-aware approach obtains higher task completion rates, and is subjectively preferred by users.
arXiv Detail & Related papers (2023-01-06T15:03:27Z) - Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural
Language Instructions [53.21504989297547]
We propose a new method that combines a language model and reinforcement learning for the task of building objects in a Minecraft-like environment.
Our method first generates a set of consistently achievable sub-goals from the instructions and then completes associated sub-tasks with a pre-trained RL policy.
arXiv Detail & Related papers (2022-11-01T18:30:42Z) - Is Reinforcement Learning (Not) for Natural Language Processing?:
Benchmarks, Baselines, and Building Blocks for Natural Language Policy
Optimization [73.74371798168642]
We introduce an open-source modular library, RL4LMs, for optimizing language generators with reinforcement learning.
Next, we present the GRUE benchmark, a set of 6 language generation tasks which are supervised not by target strings, but by reward functions.
Finally, we introduce an easy-to-use, performant RL algorithm, NLPO, that learns to effectively reduce the action space in language generation.
arXiv Detail & Related papers (2022-10-03T21:38:29Z) - Language-Conditioned Imitation Learning for Robot Manipulation Tasks [39.40937105264774]
We introduce a method for incorporating unstructured natural language into imitation learning.
At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent.
The training process then interrelates these two modalities to encode the correlations between language, perception, and motion.
The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions.
arXiv Detail & Related papers (2020-10-22T21:49:08Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Human Instruction-Following with Deep Reinforcement Learning via
Transfer-Learning from Text [12.88819706338837]
Recent work has described neural-network-based agents that are trained with reinforcement learning to execute language-like commands in simulated worlds.
We propose a conceptually simple method for training instruction-following agents with deep RL that are robust to natural human instructions.
arXiv Detail & Related papers (2020-05-19T12:16:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.