Related papers: Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

URL: http://arxiv.org/abs/2301.12950v2
Date: Wed, 31 May 2023 09:08:07 GMT
Title: Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs
Authors: Guan-Ting Liu, En-Pei Hu, Pu-Jen Cheng, Hung-yi Lee, Shao-Hua Sun
Abstract summary: We propose a hierarchical programmatic reinforcement learning framework to produce program policies. By learning to compose programs, our proposed framework can produce program policies that describe out-of-distributionally complex behaviors. The experimental results in the Karel domain show that our proposed framework outperforms baselines.
Score: 58.94569213396991
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Aiming to produce reinforcement learning (RL) policies that are human-interpretable and can generalize better to novel scenarios, Trivedi et al. (2021) present a method (LEAPS) that first learns a program embedding space to continuously parameterize diverse programs from a pre-generated program dataset, and then searches for a task-solving program in the learned program embedding space when given a task. Despite the encouraging results, the program policies that LEAPS can produce are limited by the distribution of the program dataset. Furthermore, during searching, LEAPS evaluates each candidate program solely based on its return, failing to precisely reward correct parts of programs and penalize incorrect parts. To address these issues, we propose to learn a meta-policy that composes a series of programs sampled from the learned program embedding space. By learning to compose programs, our proposed hierarchical programmatic reinforcement learning (HPRL) framework can produce program policies that describe out-of-distributionally complex behaviors and directly assign credits to programs that induce desired behaviors. The experimental results in the Karel domain show that our proposed framework outperforms baselines. The ablation studies confirm the limitations of LEAPS and justify our design choices.

Related papers

BOOST: Bootstrapping Strategy-Driven Reasoning Programs for Program-Guided Fact-Checking [16.655011153015202]
Program-guided reasoning has shown promise in complex claim fact-checking. Prior work relies on few-shot in-context learning with ad-hoc demonstrations. We propose BOOST, a bootstrapping-based framework for few-shot reasoning program generation.
arXiv Detail & Related papers (2025-04-03T10:38:45Z)
Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards [49.7719149179179]
This paper investigates the feasibility of using PPO for reinforcement learning (RL) from explicitly programmed reward signals. We focus on tasks expressed through formal languages, such as programming, where explicit reward functions can be programmed to automatically assess quality of generated outputs. Our results show that pure RL-based training for the two formal language tasks is challenging, with success being limited even for the simple arithmetic task.
arXiv Detail & Related papers (2024-10-22T15:59:58Z)
Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search [7.769411917500852]
We introduce a novel LLM-guided search framework (LLM-GS) Our key insight is to leverage the programming expertise and common sense reasoning of LLMs to enhance the efficiency of assumption-free, random-guessing search methods. We develop a search algorithm named Scheduled Hill Climbing, designed to efficiently explore the programmatic search space to improve the programs consistently.
arXiv Detail & Related papers (2024-05-26T06:33:48Z)
$\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis [39.742755916373284]
Program synthesis aims to create accurate, executable programs from problem specifications. Recent studies have leveraged the power of reinforcement learning (RL) in conjunction with large language models (LLMs) Our work explores the feasibility of value-based approaches, leading to the development of our $mathcalB$-Coder.
arXiv Detail & Related papers (2023-10-04T21:40:36Z)
ANPL: Towards Natural Programming with Interactive Decomposition [33.58825633046242]
We introduce an interactive ANPL system that ensures users can always refine the generated code. An ANPL program consists of a set of input-outputs that it must satisfy. The user revises an ANPL program by either modifying the sketch, changing the language used to describe the holes, or providing additional input-outputs to a particular hole.
arXiv Detail & Related papers (2023-05-29T14:19:40Z)
GPT is becoming a Turing machine: Here are some ways to program it [16.169056235216576]
We show that GPT-3 models can be triggered to execute programs that involve loops. We show that prompts that may not even cover one full task example can trigger algorithmic behaviour.
arXiv Detail & Related papers (2023-03-25T00:43:41Z)
Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs. We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process. Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z)
Procedures as Programs: Hierarchical Control of Situated Agents through Natural Language [81.73820295186727]
We propose a formalism of procedures as programs, a powerful yet intuitive method of representing hierarchical procedural knowledge for agent command and control. We instantiate this framework on the IQA and ALFRED datasets for NL instruction following.
arXiv Detail & Related papers (2021-09-16T20:36:21Z)
Learning to Synthesize Programs as Interpretable and Generalizable Policies [25.258598215642067]
We present a framework that learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner. Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines.
arXiv Detail & Related papers (2021-08-31T07:03:06Z)
Learning from Executions for Semantic Parsing [86.94309120789396]
We focus on the task of semi-supervised learning where a limited amount of annotated data is available. We propose to encourage executable programs for unlabeled utterances.
arXiv Detail & Related papers (2021-04-12T21:07:53Z)
The ILASP system for Inductive Learning of Answer Set Programs [79.41112438865386]
Our system learns Answer Set Programs, including normal rules, choice rules and hard and weak constraints. We first give a general overview of ILASP's learning framework and its capabilities. This is followed by a comprehensive summary of the evolution of the ILASP system.
arXiv Detail & Related papers (2020-05-02T19:04:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.