Program Synthesis Guided Reinforcement Learning
- URL: http://arxiv.org/abs/2102.11137v1
- Date: Mon, 22 Feb 2021 16:05:32 GMT
- Title: Program Synthesis Guided Reinforcement Learning
- Authors: Yichen Yang, Jeevana Priya Inala, Osbert Bastani, Yewen Pu, Armando
Solar-Lezama, Martin Rinard
- Abstract summary: Key challenge for reinforcement learning is solving long-horizon planning and control problems.
Recent work has proposed leveraging programs to help guide the learning algorithm in these settings.
We propose an approach that leverages program synthesis to automatically generate the guiding program.
- Score: 34.342362868490525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key challenge for reinforcement learning is solving long-horizon planning
and control problems. Recent work has proposed leveraging programs to help
guide the learning algorithm in these settings. However, these approaches
impose a high manual burden on the user since they must provide a guiding
program for every new task they seek to achieve. We propose an approach that
leverages program synthesis to automatically generate the guiding program. A
key challenge is how to handle partially observable environments. We propose
model predictive program synthesis, which trains a generative model to predict
the unobserved portions of the world, and then synthesizes a program based on
samples from this model in a way that is robust to its uncertainty. We evaluate
our approach on a set of challenging benchmarks, including a 2D
Minecraft-inspired ``craft'' environment where the agent must perform a complex
sequence of subtasks to achieve its goal, a box-world environment that requires
abstract reasoning, and a variant of the craft environment where the agent is a
MuJoCo Ant. Our approach significantly outperforms several baselines, and
performs essentially as well as an oracle that is given an effective program.
Related papers
- Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards [49.7719149179179]
This paper investigates the feasibility of using PPO for reinforcement learning (RL) from explicitly programmed reward signals.
We focus on tasks expressed through formal languages, such as programming, where explicit reward functions can be programmed to automatically assess quality of generated outputs.
Our results show that pure RL-based training for the two formal language tasks is challenging, with success being limited even for the simple arithmetic task.
arXiv Detail & Related papers (2024-10-22T15:59:58Z) - IPSynth: Interprocedural Program Synthesis for Software Security Implementation [3.1119394814248253]
We introduce IP Synth, a novel inter-procedural program synthesis approach that automatically learns the specification of the tactic.
Our results show that our approach can accurately locate corresponding spots in the program, synthesize needed code snippets, add them to the program, and outperform ChatGPT in inter-procedural tactic synthesis tasks.
arXiv Detail & Related papers (2024-03-16T07:12:24Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - Synthesizing a Progression of Subtasks for Block-Based Visual
Programming Tasks [21.33708484899808]
We propose a novel synthesis algorithm that generates a progression of subtasks that are high-quality, well-spaced in terms of their complexity.
We show the utility of our synthesis algorithm in improving the efficacy of AI agents for solving tasks in the Karel programming environment.
arXiv Detail & Related papers (2023-05-27T16:24:36Z) - CodeRL: Mastering Code Generation through Pretrained Models and Deep
Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning.
During inference, we introduce a new generation procedure with a critical sampling strategy.
For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z) - Learning to Find Proofs and Theorems by Learning to Refine Search
Strategies [0.9137554315375919]
An AlphaZero-style agent is self-training to refine a high-level expert strategy expressed as a nondeterministic program.
An analogous teacher agent is self-training to generate tasks of suitable relevance and difficulty for the learner.
arXiv Detail & Related papers (2022-05-27T20:48:40Z) - Learning to Synthesize Programs as Interpretable and Generalizable
Policies [25.258598215642067]
We present a framework that learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner.
Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines.
arXiv Detail & Related papers (2021-08-31T07:03:06Z) - Outcome-Driven Reinforcement Learning via Variational Inference [95.82770132618862]
We discuss a new perspective on reinforcement learning, recasting it as the problem of inferring actions that achieve desired outcomes, rather than a problem of maximizing rewards.
To solve the resulting outcome-directed inference problem, we establish a novel variational inference formulation that allows us to derive a well-shaped reward function.
We empirically demonstrate that this method eliminates the need to design reward functions and leads to effective goal-directed behaviors.
arXiv Detail & Related papers (2021-04-20T18:16:21Z) - BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration [72.88493072196094]
We present a new synthesis approach that leverages learning to guide a bottom-up search over programs.
In particular, we train a model to prioritize compositions of intermediate values during search conditioned on a set of input-output examples.
We show that the combination of learning and bottom-up search is remarkably effective, even with simple supervised learning approaches.
arXiv Detail & Related papers (2020-07-28T17:46:18Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.