Hierarchical Programmatic Reinforcement Learning via Learning to Compose
Programs
- URL: http://arxiv.org/abs/2301.12950v2
- Date: Wed, 31 May 2023 09:08:07 GMT
- Title: Hierarchical Programmatic Reinforcement Learning via Learning to Compose
Programs
- Authors: Guan-Ting Liu, En-Pei Hu, Pu-Jen Cheng, Hung-yi Lee, Shao-Hua Sun
- Abstract summary: We propose a hierarchical programmatic reinforcement learning framework to produce program policies.
By learning to compose programs, our proposed framework can produce program policies that describe out-of-distributionally complex behaviors.
The experimental results in the Karel domain show that our proposed framework outperforms baselines.
- Score: 58.94569213396991
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Aiming to produce reinforcement learning (RL) policies that are
human-interpretable and can generalize better to novel scenarios, Trivedi et
al. (2021) present a method (LEAPS) that first learns a program embedding space
to continuously parameterize diverse programs from a pre-generated program
dataset, and then searches for a task-solving program in the learned program
embedding space when given a task. Despite the encouraging results, the program
policies that LEAPS can produce are limited by the distribution of the program
dataset. Furthermore, during searching, LEAPS evaluates each candidate program
solely based on its return, failing to precisely reward correct parts of
programs and penalize incorrect parts. To address these issues, we propose to
learn a meta-policy that composes a series of programs sampled from the learned
program embedding space. By learning to compose programs, our proposed
hierarchical programmatic reinforcement learning (HPRL) framework can produce
program policies that describe out-of-distributionally complex behaviors and
directly assign credits to programs that induce desired behaviors. The
experimental results in the Karel domain show that our proposed framework
outperforms baselines. The ablation studies confirm the limitations of LEAPS
and justify our design choices.
Related papers
- Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards [49.7719149179179]
This paper investigates the feasibility of using PPO for reinforcement learning (RL) from explicitly programmed reward signals.
We focus on tasks expressed through formal languages, such as programming, where explicit reward functions can be programmed to automatically assess quality of generated outputs.
Our results show that pure RL-based training for the two formal language tasks is challenging, with success being limited even for the simple arithmetic task.
arXiv Detail & Related papers (2024-10-22T15:59:58Z) - Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search [7.769411917500852]
We introduce a novel LLM-guided search framework (LLM-GS)
Our key insight is to leverage the programming expertise and common sense reasoning of LLMs to enhance the efficiency of assumption-free, random-guessing search methods.
We develop a search algorithm named Scheduled Hill Climbing, designed to efficiently explore the programmatic search space to improve the programs consistently.
arXiv Detail & Related papers (2024-05-26T06:33:48Z) - $\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis [39.742755916373284]
Program synthesis aims to create accurate, executable programs from problem specifications.
Recent studies have leveraged the power of reinforcement learning (RL) in conjunction with large language models (LLMs)
Our work explores the feasibility of value-based approaches, leading to the development of our $mathcalB$-Coder.
arXiv Detail & Related papers (2023-10-04T21:40:36Z) - ANPL: Towards Natural Programming with Interactive Decomposition [33.58825633046242]
We introduce an interactive ANPL system that ensures users can always refine the generated code.
An ANPL program consists of a set of input-outputs that it must satisfy.
The user revises an ANPL program by either modifying the sketch, changing the language used to describe the holes, or providing additional input-outputs to a particular hole.
arXiv Detail & Related papers (2023-05-29T14:19:40Z) - GPT is becoming a Turing machine: Here are some ways to program it [16.169056235216576]
We show that GPT-3 models can be triggered to execute programs that involve loops.
We show that prompts that may not even cover one full task example can trigger algorithmic behaviour.
arXiv Detail & Related papers (2023-03-25T00:43:41Z) - Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs.
We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process.
Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z) - Procedures as Programs: Hierarchical Control of Situated Agents through
Natural Language [81.73820295186727]
We propose a formalism of procedures as programs, a powerful yet intuitive method of representing hierarchical procedural knowledge for agent command and control.
We instantiate this framework on the IQA and ALFRED datasets for NL instruction following.
arXiv Detail & Related papers (2021-09-16T20:36:21Z) - Learning to Synthesize Programs as Interpretable and Generalizable
Policies [25.258598215642067]
We present a framework that learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner.
Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines.
arXiv Detail & Related papers (2021-08-31T07:03:06Z) - Learning from Executions for Semantic Parsing [86.94309120789396]
We focus on the task of semi-supervised learning where a limited amount of annotated data is available.
We propose to encourage executable programs for unlabeled utterances.
arXiv Detail & Related papers (2021-04-12T21:07:53Z) - The ILASP system for Inductive Learning of Answer Set Programs [79.41112438865386]
Our system learns Answer Set Programs, including normal rules, choice rules and hard and weak constraints.
We first give a general overview of ILASP's learning framework and its capabilities.
This is followed by a comprehensive summary of the evolution of the ILASP system.
arXiv Detail & Related papers (2020-05-02T19:04:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.