Towards Deployment-Efficient Reinforcement Learning: Lower Bound and
Optimality
- URL: http://arxiv.org/abs/2202.06450v1
- Date: Mon, 14 Feb 2022 01:31:46 GMT
- Title: Towards Deployment-Efficient Reinforcement Learning: Lower Bound and
Optimality
- Authors: Jiawei Huang, Jinglin Chen, Li Zhao, Tao Qin, Nan Jiang, Tie-Yan Liu
- Abstract summary: Deployment efficiency is an important criterion for many real-world applications of reinforcement learning (RL)
We propose a theoretical formulation for deployment-efficient RL (DE-RL) from an "optimization with constraints" perspective.
- Score: 141.89413461337324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deployment efficiency is an important criterion for many real-world
applications of reinforcement learning (RL). Despite the community's increasing
interest, there lacks a formal theoretical formulation for the problem. In this
paper, we propose such a formulation for deployment-efficient RL (DE-RL) from
an "optimization with constraints" perspective: we are interested in exploring
an MDP and obtaining a near-optimal policy within minimal \emph{deployment
complexity}, whereas in each deployment the policy can sample a large batch of
data. Using finite-horizon linear MDPs as a concrete structural model, we
reveal the fundamental limit in achieving deployment efficiency by establishing
information-theoretic lower bounds, and provide algorithms that achieve the
optimal deployment efficiency. Moreover, our formulation for DE-RL is flexible
and can serve as a building block for other practically relevant settings; we
give "Safe DE-RL" and "Sample-Efficient DE-RL" as two examples, which may be
worth future investigation.
Related papers
- OffRIPP: Offline RL-based Informative Path Planning [12.705099730591671]
IPP is a crucial task in robotics, where agents must design paths to gather valuable information about a target environment.
We propose an offline RL-based IPP framework that optimized information gain without requiring real-time interaction during training.
We validate the framework through extensive simulations and real-world experiments.
arXiv Detail & Related papers (2024-09-25T11:30:59Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback [106.63518036538163]
We present a novel unified bilevel optimization-based framework, textsfPARL, formulated to address the recently highlighted critical issue of policy alignment in reinforcement learning.
Our framework addressed these concerns by explicitly parameterizing the distribution of the upper alignment objective (reward design) by the lower optimal variable.
Our empirical results substantiate that the proposed textsfPARL can address the alignment concerns in RL by showing significant improvements.
arXiv Detail & Related papers (2023-08-03T18:03:44Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories.
We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z) - Leveraging Factored Action Spaces for Efficient Offline Reinforcement
Learning in Healthcare [38.42691031505782]
We propose a form of linear Q-function decomposition induced by factored action spaces.
Our approach can help an agent make more accurate inferences within underexplored regions of the state-action space.
arXiv Detail & Related papers (2023-05-02T19:13:10Z) - Revisiting the Linear-Programming Framework for Offline RL with General
Function Approximation [24.577243536475233]
offline reinforcement learning (RL) concerns pursuing an optimal policy for sequential decision-making from a pre-collected dataset.
Recent theoretical progress has focused on developing sample-efficient offline RL algorithms with various relaxed assumptions on data coverage and function approximators.
We revisit the linear-programming framework for offline RL, and advance the existing results in several aspects.
arXiv Detail & Related papers (2022-12-28T15:28:12Z) - Multi-fidelity reinforcement learning framework for shape optimization [0.8258451067861933]
We introduce a controlled transfer learning framework that leverages a multi-fidelity simulation setting.
Our strategy is deployed for an airfoil shape optimization problem at high Reynolds numbers.
Our results demonstrate this framework's applicability to other scientific DRL scenarios.
arXiv Detail & Related papers (2022-02-22T20:44:04Z) - MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch
Optimization for Deployment Constrained Reinforcement Learning [108.79676336281211]
Continuous deployment of new policies for data collection and online learning is either cost ineffective or impractical.
We propose a new algorithmic learning framework called Model-based Uncertainty regularized and Sample Efficient Batch Optimization.
Our framework discovers novel and high quality samples for each deployment to enable efficient data collection.
arXiv Detail & Related papers (2021-02-23T01:30:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.