ProSpec RL: Plan Ahead, then Execute
- URL: http://arxiv.org/abs/2407.21359v1
- Date: Wed, 31 Jul 2024 06:04:55 GMT
- Title: ProSpec RL: Plan Ahead, then Execute
- Authors: Liangliang Liu, Yi Guan, BoRan Wang, Rujia Shen, Yi Lin, Chaoran Kong, Lian Yan, Jingchi Jiang,
- Abstract summary: We propose the Prospective (ProSpec) RL method, which makes higher-value, lower-risk optimal decisions by imagining future n-stream trajectories.
ProSpec employs a dynamic model to predict future states based on the current state and a series of sampled actions.
We validate the effectiveness of our method on the DMControl benchmarks, where our approach achieved significant performance improvements.
- Score: 7.028937493640123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imagining potential outcomes of actions before execution helps agents make more informed decisions, a prospective thinking ability fundamental to human cognition. However, mainstream model-free Reinforcement Learning (RL) methods lack the ability to proactively envision future scenarios, plan, and guide strategies. These methods typically rely on trial and error to adjust policy functions, aiming to maximize cumulative rewards or long-term value, even if such high-reward decisions place the environment in extremely dangerous states. To address this, we propose the Prospective (ProSpec) RL method, which makes higher-value, lower-risk optimal decisions by imagining future n-stream trajectories. Specifically, ProSpec employs a dynamic model to predict future states (termed "imagined states") based on the current state and a series of sampled actions. Furthermore, we integrate the concept of Model Predictive Control and introduce a cycle consistency constraint that allows the agent to evaluate and select the optimal actions from these trajectories. Moreover, ProSpec employs cycle consistency to mitigate two fundamental issues in RL: augmenting state reversibility to avoid irreversible events (low risk) and augmenting actions to generate numerous virtual trajectories, thereby improving data efficiency. We validated the effectiveness of our method on the DMControl benchmarks, where our approach achieved significant performance improvements. Code will be open-sourced upon acceptance.
Related papers
- Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL)
QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM)
Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems.
However, RL approaches are intractable in the slate recommendation scenario.
In that setting, an action corresponds to a slate that may contain any combination of items.
In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder.
We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z) - Knowing the Past to Predict the Future: Reinforcement Virtual Learning [29.47688292868217]
Reinforcement Learning (RL)-based control system has received considerable attention in recent decades.
In this paper, we present a cost-efficient framework, such that the RL model can evolve for itself in a Virtual Space.
The proposed framework enables a step-by-step RL model to predict the future state and select optimal actions for long-sight decisions.
arXiv Detail & Related papers (2022-11-02T16:48:14Z) - Variational Inference for Model-Free and Model-Based Reinforcement
Learning [4.416484585765028]
Variational inference (VI) is a type of approximate Bayesian inference that approximates an intractable posterior distribution with a tractable one.
Reinforcement learning (RL) on the other hand deals with autonomous agents and how to make them act optimally.
This manuscript shows how the apparently different subjects of VI and RL are linked in two fundamental ways.
arXiv Detail & Related papers (2022-09-04T21:03:14Z) - TAE: A Semi-supervised Controllable Behavior-aware Trajectory Generator
and Predictor [3.6955256596550137]
Trajectory generation and prediction play important roles in planner evaluation and decision making for intelligent vehicles.
We propose a behavior-aware Trajectory Autoencoder (TAE) that explicitly models drivers' behavior.
Our model addresses trajectory generation and prediction in a unified architecture and benefits both tasks.
arXiv Detail & Related papers (2022-03-02T17:37:44Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning [1.26990070983988]
Model-based reinforcement learning (MBRL) aims to learn model(s) of the environment dynamics that can predict the outcome of its actions.
We propose uncertainty estimation methods for online evaluation of imagined trajectories.
Results highlight significant reduction on computational costs without sacrificing performance.
arXiv Detail & Related papers (2021-05-12T15:04:07Z) - Provably Good Batch Reinforcement Learning Without Great Exploration [51.51462608429621]
Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes tasks.
Recent algorithms have shown promise but can still be overly optimistic in their expected outcomes.
We show that a small modification to Bellman optimality and evaluation back-up to take a more conservative update can have much stronger guarantees.
arXiv Detail & Related papers (2020-07-16T09:25:54Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.