Related papers: Natural Language Specification of Reinforcement Learning Policies through Differentiable Decision Trees

Natural Language Specification of Reinforcement Learning Policies through Differentiable Decision Trees

URL: http://arxiv.org/abs/2101.07140v4
Date: Sat, 20 May 2023 21:13:00 GMT
Title: Natural Language Specification of Reinforcement Learning Policies through Differentiable Decision Trees
Authors: Pradyumna Tambwekar, Andrew Silva, Nakul Gopalan, Matthew Gombolay
Abstract summary: Human-AI policy specification is a novel procedure we define in which humans can collaboratively warm-start a robot's reinforcement learning policy. We develop a novel collaborative framework to allow humans to initialize and interpret an autonomous agent's behavior. Our approach warm-starts an RL agent by utilizing non-expert natural language specifications without incurring the additional domain exploration costs.
Score: 10.406631494442683
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human-AI policy specification is a novel procedure we define in which humans can collaboratively warm-start a robot's reinforcement learning policy. This procedure is comprised of two steps; (1) Policy Specification, i.e. humans specifying the behavior they would like their companion robot to accomplish, and (2) Policy Optimization, i.e. the robot applying reinforcement learning to improve the initial policy. Existing approaches to enabling collaborative policy specification are often unintelligible black-box methods, and are not catered towards making the autonomous system accessible to a novice end-user. In this paper, we develop a novel collaborative framework to allow humans to initialize and interpret an autonomous agent's behavior. Through our framework, we enable humans to specify an initial behavior model via unstructured, natural language (NL), which we convert to lexical decision trees. Next, we leverage these translated specifications, to warm-start reinforcement learning and allow the agent to further optimize these potentially suboptimal policies. Our approach warm-starts an RL agent by utilizing non-expert natural language specifications without incurring the additional domain exploration costs. We validate our approach by showing that our model is able to produce >80% translation accuracy, and that policies initialized by a human can match the performance of relevant RL baselines in two domains.

Related papers

Generalising from Self-Produced Data: Model Training Beyond Human Constraints [0.0]
This paper introduces a novel framework in which AI models autonomously generate and validate new knowledge. Central to this approach is an unbounded, ungamable numeric reward that guides learning without requiring human benchmarks.
arXiv Detail & Related papers (2025-04-07T03:48:02Z)
Dense Policy: Bidirectional Autoregressive Learning of Actions [51.60428100831717]
This paper introduces a bidirectionally expanded learning approach, termed Dense Policy, to establish a new paradigm for autoregressive policies in action prediction. It employs a lightweight encoder-only architecture to iteratively unfold the action sequence from an initial single frame into the target sequence in a coarse-to-fine manner. Experiments validate that our dense policy has superior autoregressive learning capabilities and can surpass existing holistic generative policies.
arXiv Detail & Related papers (2025-03-17T14:28:08Z)
Policy Learning with a Language Bottleneck [65.99843627646018]
Policy Learning with a Language Bottleneck (PLLBB) is a framework enabling AI agents to generate linguistic rules. PLLBB alternates between a rule generation step guided by language models, and an update step where agents learn new policies guided by rules. In a two-player communication game, a maze solving task, and two image reconstruction tasks, we show thatPLLBB agents are not only able to learn more interpretable and generalizable behaviors, but can also share the learned rules with human users.
arXiv Detail & Related papers (2024-05-07T08:40:21Z)
DECIDER: A Dual-System Rule-Controllable Decoding Framework for Language Generation [57.07295906718989]
Constrained decoding approaches aim to control the meaning or style of text generated by a Pre-trained Language Model (PLM) using specific target words during inference. We propose a novel decoding framework, DECIDER, which enables us to program rules on how we complete tasks to control a PLM.
arXiv Detail & Related papers (2024-03-04T11:49:08Z)
"No, to the Right" -- Online Language Corrections for Robotic Manipulation via Shared Autonomy [70.45420918526926]
We present LILAC, a framework for incorporating and adapting to natural language corrections online during execution. Instead of discrete turn-taking between a human and robot, LILAC splits agency between the human and robot. We show that our corrections-aware approach obtains higher task completion rates, and is subjectively preferred by users.
arXiv Detail & Related papers (2023-01-06T15:03:27Z)
Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language Instructions [53.21504989297547]
We propose a new method that combines a language model and reinforcement learning for the task of building objects in a Minecraft-like environment. Our method first generates a set of consistently achievable sub-goals from the instructions and then completes associated sub-tasks with a pre-trained RL policy.
arXiv Detail & Related papers (2022-11-01T18:30:42Z)
Is Reinforcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization [73.74371798168642]
We introduce an open-source modular library, RL4LMs, for optimizing language generators with reinforcement learning. Next, we present the GRUE benchmark, a set of 6 language generation tasks which are supervised not by target strings, but by reward functions. Finally, we introduce an easy-to-use, performant RL algorithm, NLPO, that learns to effectively reduce the action space in language generation.
arXiv Detail & Related papers (2022-10-03T21:38:29Z)
Language-Conditioned Imitation Learning for Robot Manipulation Tasks [39.40937105264774]
We introduce a method for incorporating unstructured natural language into imitation learning. At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent. The training process then interrelates these two modalities to encode the correlations between language, perception, and motion. The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions.
arXiv Detail & Related papers (2020-10-22T21:49:08Z)
Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)
Human Instruction-Following with Deep Reinforcement Learning via Transfer-Learning from Text [12.88819706338837]
Recent work has described neural-network-based agents that are trained with reinforcement learning to execute language-like commands in simulated worlds. We propose a conceptually simple method for training instruction-following agents with deep RL that are robust to natural human instructions.
arXiv Detail & Related papers (2020-05-19T12:16:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.