Learning to Synthesize Programs as Interpretable and Generalizable
Policies
- URL: http://arxiv.org/abs/2108.13643v1
- Date: Tue, 31 Aug 2021 07:03:06 GMT
- Title: Learning to Synthesize Programs as Interpretable and Generalizable
Policies
- Authors: Dweep Trivedi, Jesse Zhang, Shao-Hua Sun, Joseph J. Lim
- Abstract summary: We present a framework that learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner.
Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines.
- Score: 25.258598215642067
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, deep reinforcement learning (DRL) methods have achieved impressive
performance on tasks in a variety of domains. However, neural network policies
produced with DRL methods are not human-interpretable and often have difficulty
generalizing to novel scenarios. To address these issues, prior works explore
learning programmatic policies that are more interpretable and structured for
generalization. Yet, these works either employ limited policy representations
(e.g. decision trees, state machines, or predefined program templates) or
require stronger supervision (e.g. input/output state pairs or expert
demonstrations). We present a framework that instead learns to synthesize a
program, which details the procedure to solve a task in a flexible and
expressive manner, solely from reward signals. To alleviate the difficulty of
learning to compose programs to induce the desired agent behavior from scratch,
we propose to first learn a program embedding space that continuously
parameterizes diverse behaviors in an unsupervised manner and then search over
the learned program embedding space to yield a program that maximizes the
return for a given task. Experimental results demonstrate that the proposed
framework not only learns to reliably synthesize task-solving programs but also
outperforms DRL and program synthesis baselines while producing interpretable
and more generalizable policies. We also justify the necessity of the proposed
two-stage learning scheme as well as analyze various methods for learning the
program embedding.
Related papers
- Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards [49.7719149179179]
This paper investigates the feasibility of using PPO for reinforcement learning (RL) from explicitly programmed reward signals.
We focus on tasks expressed through formal languages, such as programming, where explicit reward functions can be programmed to automatically assess quality of generated outputs.
Our results show that pure RL-based training for the two formal language tasks is challenging, with success being limited even for the simple arithmetic task.
arXiv Detail & Related papers (2024-10-22T15:59:58Z) - Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search [7.769411917500852]
We introduce a novel LLM-guided search framework (LLM-GS)
Our key insight is to leverage the programming expertise and common sense reasoning of LLMs to enhance the efficiency of assumption-free, random-guessing search methods.
We develop a search algorithm named Scheduled Hill Climbing, designed to efficiently explore the programmatic search space to improve the programs consistently.
arXiv Detail & Related papers (2024-05-26T06:33:48Z) - Program Machine Policy: Addressing Long-Horizon Tasks by Integrating
Program Synthesis and State Machines [7.159109885159399]
Program Machine Policy (POMP) bridges the advantages of programmatic RL and state machine policies.
We introduce a method that can retrieve a set of effective, diverse, and compatible programs.
Our proposed framework outperforms programmatic RL and deep RL baselines on various tasks.
arXiv Detail & Related papers (2023-11-27T16:06:39Z) - $\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis [39.742755916373284]
Program synthesis aims to create accurate, executable programs from problem specifications.
Recent studies have leveraged the power of reinforcement learning (RL) in conjunction with large language models (LLMs)
Our work explores the feasibility of value-based approaches, leading to the development of our $mathcalB$-Coder.
arXiv Detail & Related papers (2023-10-04T21:40:36Z) - GPT is becoming a Turing machine: Here are some ways to program it [16.169056235216576]
We show that GPT-3 models can be triggered to execute programs that involve loops.
We show that prompts that may not even cover one full task example can trigger algorithmic behaviour.
arXiv Detail & Related papers (2023-03-25T00:43:41Z) - Hierarchical Programmatic Reinforcement Learning via Learning to Compose
Programs [58.94569213396991]
We propose a hierarchical programmatic reinforcement learning framework to produce program policies.
By learning to compose programs, our proposed framework can produce program policies that describe out-of-distributionally complex behaviors.
The experimental results in the Karel domain show that our proposed framework outperforms baselines.
arXiv Detail & Related papers (2023-01-30T14:50:46Z) - Procedures as Programs: Hierarchical Control of Situated Agents through
Natural Language [81.73820295186727]
We propose a formalism of procedures as programs, a powerful yet intuitive method of representing hierarchical procedural knowledge for agent command and control.
We instantiate this framework on the IQA and ALFRED datasets for NL instruction following.
arXiv Detail & Related papers (2021-09-16T20:36:21Z) - How could Neural Networks understand Programs? [67.4217527949013]
It is difficult to build a model to better understand programs, by either directly applying off-the-shelf NLP pre-training techniques to the source code, or adding features to the model by theshelf.
We propose a novel program semantics learning paradigm, that the model should learn from information composed of (1) the representations which align well with the fundamental operations in operational semantics, and (2) the information of environment transition.
arXiv Detail & Related papers (2021-05-10T12:21:42Z) - Can We Learn Heuristics For Graphical Model Inference Using
Reinforcement Learning? [114.24881214319048]
We show that we can learn programs, i.e., policies, for solving inference in higher order Conditional Random Fields (CRFs) using reinforcement learning.
Our method solves inference tasks efficiently without imposing any constraints on the form of the potentials.
arXiv Detail & Related papers (2020-04-27T19:24:04Z) - Hierarchical Variational Imitation Learning of Control Programs [131.7671843857375]
We propose a variational inference method for imitation learning of a control policy represented by parametrized hierarchical procedures (PHP)
Our method discovers the hierarchical structure in a dataset of observation-action traces of teacher demonstrations, by learning an approximate posterior distribution over the latent sequence of procedure calls and terminations.
We demonstrate a novel benefit of variational inference in the context of hierarchical imitation learning: in decomposing the policy into simpler procedures, inference can leverage acausal information that is unused by other methods.
arXiv Detail & Related papers (2019-12-29T08:57:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.