Natural Language-conditioned Reinforcement Learning with Inside-out Task
Language Development and Translation
- URL: http://arxiv.org/abs/2302.09368v1
- Date: Sat, 18 Feb 2023 15:49:09 GMT
- Title: Natural Language-conditioned Reinforcement Learning with Inside-out Task
Language Development and Translation
- Authors: Jing-Cheng Pang, Xin-Yu Yang, Si-Hang Yang, Yang Yu
- Abstract summary: Natural Language-conditioned reinforcement learning (RL) enables the agents to follow human instructions.
Previous approaches generally implemented language-conditioned RL by providing human instructions in natural language (NL) and training a following policy.
We develop an inside-out scheme for natural language-conditioned RL by developing a task language (TL) that is task-related and unique.
- Score: 14.176720914723127
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Natural Language-conditioned reinforcement learning (RL) enables the agents
to follow human instructions. Previous approaches generally implemented
language-conditioned RL by providing human instructions in natural language
(NL) and training a following policy. In this outside-in approach, the policy
needs to comprehend the NL and manage the task simultaneously. However, the
unbounded NL examples often bring much extra complexity for solving concrete RL
tasks, which can distract policy learning from completing the task. To ease the
learning burden of the policy, we investigate an inside-out scheme for natural
language-conditioned RL by developing a task language (TL) that is task-related
and unique. The TL is used in RL to achieve highly efficient and effective
policy training. Besides, a translator is trained to translate NL into TL. We
implement this scheme as TALAR (TAsk Language with predicAte Representation)
that learns multiple predicates to model object relationships as the TL.
Experiments indicate that TALAR not only better comprehends NL instructions but
also leads to a better instruction-following policy that improves 13.4% success
rate and adapts to unseen expressions of NL instruction. The TL can also be an
effective task abstraction, naturally compatible with hierarchical RL.
Related papers
- Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards [49.7719149179179]
This paper investigates the feasibility of using PPO for reinforcement learning (RL) from explicitly programmed reward signals.
We focus on tasks expressed through formal languages, such as programming, where explicit reward functions can be programmed to automatically assess quality of generated outputs.
Our results show that pure RL-based training for the two formal language tasks is challenging, with success being limited even for the simple arithmetic task.
arXiv Detail & Related papers (2024-10-22T15:59:58Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Natural Language Reinforcement Learning [25.165291680493844]
We introduce Natural Language Reinforcement Learning (NLRL), which combines RL principles with natural language representation.
Specifically, NLRL redefines RL concepts like task objectives, policy, value function, Bellman equation, and policy iteration in natural language space.
arXiv Detail & Related papers (2024-02-11T11:03:04Z) - Safe Reinforcement Learning with Free-form Natural Language Constraints and Pre-Trained Language Models [36.44404825103045]
Safe reinforcement learning (RL) agents accomplish given tasks while adhering to specific constraints.
We propose to use pre-trained language models (LM) to facilitate RL agents' comprehension of natural language constraints.
Our method enhances safe policy learning under a diverse set of human-derived free-form natural language constraints.
arXiv Detail & Related papers (2024-01-15T09:37:03Z) - GLIDE-RL: Grounded Language Instruction through DEmonstration in RL [7.658523833511356]
Training efficient Reinforcement Learning (RL) agents grounded in natural language has been a long-standing challenge.
We present a novel algorithm, Grounded Language Instruction through DEmonstration in RL (GLIDE-RL) that introduces a teacher-instructor-student curriculum learning framework.
In this multi-agent framework, the teacher and the student agents learn simultaneously based on the student's current skill level.
arXiv Detail & Related papers (2024-01-03T17:32:13Z) - Large Language Models as Generalizable Policies for Embodied Tasks [50.870491905776305]
We show that large language models (LLMs) can be adapted to be generalizable policies for embodied visual tasks.
Our approach, called Large LAnguage model Reinforcement Learning Policy (LLaRP), adapts a pre-trained frozen LLM to take as input text instructions and visual egocentric observations and output actions directly in the environment.
arXiv Detail & Related papers (2023-10-26T18:32:05Z) - Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural
Language Instructions [53.21504989297547]
We propose a new method that combines a language model and reinforcement learning for the task of building objects in a Minecraft-like environment.
Our method first generates a set of consistently achievable sub-goals from the instructions and then completes associated sub-tasks with a pre-trained RL policy.
arXiv Detail & Related papers (2022-11-01T18:30:42Z) - Is Reinforcement Learning (Not) for Natural Language Processing?:
Benchmarks, Baselines, and Building Blocks for Natural Language Policy
Optimization [73.74371798168642]
We introduce an open-source modular library, RL4LMs, for optimizing language generators with reinforcement learning.
Next, we present the GRUE benchmark, a set of 6 language generation tasks which are supervised not by target strings, but by reward functions.
Finally, we introduce an easy-to-use, performant RL algorithm, NLPO, that learns to effectively reduce the action space in language generation.
arXiv Detail & Related papers (2022-10-03T21:38:29Z) - LISA: Learning Interpretable Skill Abstractions from Language [85.20587800593293]
We propose a hierarchical imitation learning framework that can learn diverse, interpretable skills from language-conditioned demonstrations.
Our method demonstrates a more natural way to condition on language in sequential decision-making problems.
arXiv Detail & Related papers (2022-02-28T19:43:24Z) - LTL2Action: Generalizing LTL Instructions for Multi-Task RL [4.245018630914216]
We address the problem of teaching a deep reinforcement learning (RL) agent to follow instructions in multi-task environments.
We employ a well-known formal language -- linear temporal logic (LTL) -- to specify instructions, using a domain-specific vocabulary.
arXiv Detail & Related papers (2021-02-13T04:05:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.