Program Machine Policy: Addressing Long-Horizon Tasks by Integrating
Program Synthesis and State Machines
- URL: http://arxiv.org/abs/2311.15960v2
- Date: Fri, 9 Feb 2024 02:58:37 GMT
- Title: Program Machine Policy: Addressing Long-Horizon Tasks by Integrating
Program Synthesis and State Machines
- Authors: Yu-An Lin, Chen-Tao Lee, Guan-Ting Liu, Pu-Jen Cheng, Shao-Hua Sun
- Abstract summary: Program Machine Policy (POMP) bridges the advantages of programmatic RL and state machine policies.
We introduce a method that can retrieve a set of effective, diverse, and compatible programs.
Our proposed framework outperforms programmatic RL and deep RL baselines on various tasks.
- Score: 7.159109885159399
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep reinforcement learning (deep RL) excels in various domains but lacks
generalizability and interpretability. On the other hand, programmatic RL
methods (Trivedi et al., 2021; Liu et al., 2023) reformulate RL tasks as
synthesizing interpretable programs that can be executed in the environments.
Despite encouraging results, these methods are limited to short-horizon tasks.
On the other hand, representing RL policies using state machines (Inala et al.,
2020) can inductively generalize to long-horizon tasks; however, it struggles
to scale up to acquire diverse and complex behaviors. This work proposes the
Program Machine Policy (POMP), which bridges the advantages of programmatic RL
and state machine policies, allowing for the representation of complex
behaviors and the address of long-term tasks. Specifically, we introduce a
method that can retrieve a set of effective, diverse, and compatible programs.
Then, we use these programs as modes of a state machine and learn a transition
function to transition among mode programs, allowing for capturing repetitive
behaviors. Our proposed framework outperforms programmatic RL and deep RL
baselines on various tasks and demonstrates the ability to inductively
generalize to even longer horizons without any fine-tuning. Ablation studies
justify the effectiveness of our proposed search algorithm for retrieving a set
of programs as modes.
Related papers
- Exploring RL-based LLM Training for Formal Language Tasks with Programmed Rewards [49.7719149179179]
This paper investigates the feasibility of using PPO for reinforcement learning (RL) from explicitly programmed reward signals.
We focus on tasks expressed through formal languages, such as programming, where explicit reward functions can be programmed to automatically assess quality of generated outputs.
Our results show that pure RL-based training for the two formal language tasks is challenging, with success being limited even for the simple arithmetic task.
arXiv Detail & Related papers (2024-10-22T15:59:58Z) - Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Deep reinforcement learning for machine scheduling: Methodology, the
state-of-the-art, and future directions [2.4541568670428915]
Machine scheduling aims to optimize job assignments to machines while adhering to manufacturing rules and job specifications.
Deep Reinforcement Learning (DRL), a key component of artificial general intelligence, has shown promise in various domains like gaming and robotics.
This paper offers a comprehensive review and comparison of DRL-based approaches, highlighting their methodology, applications, advantages, and limitations.
arXiv Detail & Related papers (2023-10-04T22:45:09Z) - $\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis [39.742755916373284]
Program synthesis aims to create accurate, executable programs from problem specifications.
Recent studies have leveraged the power of reinforcement learning (RL) in conjunction with large language models (LLMs)
Our work explores the feasibility of value-based approaches, leading to the development of our $mathcalB$-Coder.
arXiv Detail & Related papers (2023-10-04T21:40:36Z) - Hierarchical Programmatic Reinforcement Learning via Learning to Compose
Programs [58.94569213396991]
We propose a hierarchical programmatic reinforcement learning framework to produce program policies.
By learning to compose programs, our proposed framework can produce program policies that describe out-of-distributionally complex behaviors.
The experimental results in the Karel domain show that our proposed framework outperforms baselines.
arXiv Detail & Related papers (2023-01-30T14:50:46Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Deep Reinforcement Learning with Adjustments [10.244120641608447]
We propose a new Q-learning algorithm for continuous action space, which can bridge the control and RL algorithms.
Our method can learn complex policies to achieve long-term goals and at the same time it can be easily adjusted to address short-term requirements.
arXiv Detail & Related papers (2021-09-28T03:35:09Z) - Learning to Synthesize Programs as Interpretable and Generalizable
Policies [25.258598215642067]
We present a framework that learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner.
Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines.
arXiv Detail & Related papers (2021-08-31T07:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.