Related papers: Learning to Synthesize Programs as Interpretable and Generalizable Policies

Learning to Synthesize Programs as Interpretable and Generalizable Policies

URL: http://arxiv.org/abs/2108.13643v1
Date: Tue, 31 Aug 2021 07:03:06 GMT
Title: Learning to Synthesize Programs as Interpretable and Generalizable Policies
Authors: Dweep Trivedi, Jesse Zhang, Shao-Hua Sun, Joseph J. Lim
Abstract summary: We present a framework that learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner. Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines.
Score: 25.258598215642067
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, deep reinforcement learning (DRL) methods have achieved impressive performance on tasks in a variety of domains. However, neural network policies produced with DRL methods are not human-interpretable and often have difficulty generalizing to novel scenarios. To address these issues, prior works explore learning programmatic policies that are more interpretable and structured for generalization. Yet, these works either employ limited policy representations (e.g. decision trees, state machines, or predefined program templates) or require stronger supervision (e.g. input/output state pairs or expert demonstrations). We present a framework that instead learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner, solely from reward signals. To alleviate the difficulty of learning to compose programs to induce the desired agent behavior from scratch, we propose to first learn a program embedding space that continuously parameterizes diverse behaviors in an unsupervised manner and then search over the learned program embedding space to yield a program that maximizes the return for a given task. Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines while producing interpretable and more generalizable policies. We also justify the necessity of the proposed two-stage learning scheme as well as analyze various methods for learning the program embedding.

Related papers

Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards [49.7719149179179]
This paper investigates the feasibility of using PPO for reinforcement learning (RL) from explicitly programmed reward signals. We focus on tasks expressed through formal languages, such as programming, where explicit reward functions can be programmed to automatically assess quality of generated outputs. Our results show that pure RL-based training for the two formal language tasks is challenging, with success being limited even for the simple arithmetic task.
arXiv Detail & Related papers (2024-10-22T15:59:58Z)
Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search [7.769411917500852]
We introduce a novel LLM-guided search framework (LLM-GS) Our key insight is to leverage the programming expertise and common sense reasoning of LLMs to enhance the efficiency of assumption-free, random-guessing search methods. We develop a search algorithm named Scheduled Hill Climbing, designed to efficiently explore the programmatic search space to improve the programs consistently.
arXiv Detail & Related papers (2024-05-26T06:33:48Z)
Program Machine Policy: Addressing Long-Horizon Tasks by Integrating Program Synthesis and State Machines [7.159109885159399]
Program Machine Policy (POMP) bridges the advantages of programmatic RL and state machine policies. We introduce a method that can retrieve a set of effective, diverse, and compatible programs. Our proposed framework outperforms programmatic RL and deep RL baselines on various tasks.
arXiv Detail & Related papers (2023-11-27T16:06:39Z)
$\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis [39.742755916373284]
Program synthesis aims to create accurate, executable programs from problem specifications. Recent studies have leveraged the power of reinforcement learning (RL) in conjunction with large language models (LLMs) Our work explores the feasibility of value-based approaches, leading to the development of our $mathcalB$-Coder.
arXiv Detail & Related papers (2023-10-04T21:40:36Z)
GPT is becoming a Turing machine: Here are some ways to program it [16.169056235216576]
We show that GPT-3 models can be triggered to execute programs that involve loops. We show that prompts that may not even cover one full task example can trigger algorithmic behaviour.
arXiv Detail & Related papers (2023-03-25T00:43:41Z)
Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs [58.94569213396991]
We propose a hierarchical programmatic reinforcement learning framework to produce program policies. By learning to compose programs, our proposed framework can produce program policies that describe out-of-distributionally complex behaviors. The experimental results in the Karel domain show that our proposed framework outperforms baselines.
arXiv Detail & Related papers (2023-01-30T14:50:46Z)
Procedures as Programs: Hierarchical Control of Situated Agents through Natural Language [81.73820295186727]
We propose a formalism of procedures as programs, a powerful yet intuitive method of representing hierarchical procedural knowledge for agent command and control. We instantiate this framework on the IQA and ALFRED datasets for NL instruction following.
arXiv Detail & Related papers (2021-09-16T20:36:21Z)
How could Neural Networks understand Programs? [67.4217527949013]
It is difficult to build a model to better understand programs, by either directly applying off-the-shelf NLP pre-training techniques to the source code, or adding features to the model by theshelf. We propose a novel program semantics learning paradigm, that the model should learn from information composed of (1) the representations which align well with the fundamental operations in operational semantics, and (2) the information of environment transition.
arXiv Detail & Related papers (2021-05-10T12:21:42Z)
Can We Learn Heuristics For Graphical Model Inference Using Reinforcement Learning? [114.24881214319048]
We show that we can learn programs, i.e., policies, for solving inference in higher order Conditional Random Fields (CRFs) using reinforcement learning. Our method solves inference tasks efficiently without imposing any constraints on the form of the potentials.
arXiv Detail & Related papers (2020-04-27T19:24:04Z)
Hierarchical Variational Imitation Learning of Control Programs [131.7671843857375]
We propose a variational inference method for imitation learning of a control policy represented by parametrized hierarchical procedures (PHP) Our method discovers the hierarchical structure in a dataset of observation-action traces of teacher demonstrations, by learning an approximate posterior distribution over the latent sequence of procedure calls and terminations. We demonstrate a novel benefit of variational inference in the context of hierarchical imitation learning: in decomposing the policy into simpler procedures, inference can leverage acausal information that is unused by other methods.
arXiv Detail & Related papers (2019-12-29T08:57:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.