Pessimism Principle Can Be Effective: Towards a Framework for Zero-Shot Transfer Reinforcement Learning
- URL: http://arxiv.org/abs/2505.18447v1
- Date: Sat, 24 May 2025 01:09:10 GMT
- Title: Pessimism Principle Can Be Effective: Towards a Framework for Zero-Shot Transfer Reinforcement Learning
- Authors: Chi Zhang, Ziying Jia, George K. Atia, Sihong He, Yue Wang,
- Abstract summary: Transfer reinforcement learning aims to derive a near-optimal policy for a target environment with limited data.<n>It faces two key challenges: the lack of performance guarantees for the transferred policy, and the risk of negative transfer when multiple source domains are involved.<n>We propose a novel framework based on the pessimism principle, which constructs and optimize a conservative estimation of the target domain's performance.
- Score: 12.927397451723719
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transfer reinforcement learning aims to derive a near-optimal policy for a target environment with limited data by leveraging abundant data from related source domains. However, it faces two key challenges: the lack of performance guarantees for the transferred policy, which can lead to undesired actions, and the risk of negative transfer when multiple source domains are involved. We propose a novel framework based on the pessimism principle, which constructs and optimizes a conservative estimation of the target domain's performance. Our framework effectively addresses the two challenges by providing an optimized lower bound on target performance, ensuring safe and reliable decisions, and by exhibiting monotonic improvement with respect to the quality of the source domains, thereby avoiding negative transfer. We construct two types of conservative estimations, rigorously characterize their effectiveness, and develop efficient distributed algorithms with convergence guarantees. Our framework provides a theoretically sound and practically robust solution for transfer learning in reinforcement learning.
Related papers
- Transfer Learning Through Conditional Quantile Matching [3.86972243789112]
We introduce a transfer learning framework for regression that leverages heterogeneous source domains to improve predictive performance in a data-scarce target domain.<n>Our approach learns a conditional generative model separately for each source domain and calibrates the generated responses to the target domain via conditional quantile matching.
arXiv Detail & Related papers (2026-02-02T17:19:55Z) - Safe, Efficient, and Robust Reinforcement Learning for Ranking and Diffusion Models [2.231476498067998]
dissertation investigates how reinforcement learning methods can be designed to be safe, sample-efficient, and robust.<n> Framed through the unifying perspective of contextual-bandit RL, the work addresses two major application domains - ranking and recommendation, and text-to-image diffusion models.
arXiv Detail & Related papers (2025-10-17T08:37:38Z) - Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality [53.525547349715595]
We propose a novel primal-only algorithm called Rectified Robust Policy Optimization (RRPO)<n>RRPO operates directly on the primal problem without relying on dual formulations.<n>We show convergence to an approximately optimal feasible policy with complexity matching the best-known lower bound.
arXiv Detail & Related papers (2025-08-24T16:59:38Z) - Risk-Averse Best Arm Set Identification with Fixed Budget and Fixed Confidence [0.562479170374811]
We introduce a novel problem setting in bandit optimization that addresses maximizing expected reward and minimizing associated uncertainty.<n>We propose a unified meta-budgetalgorithmic framework capable of operating under both fixed-confidence and fixed-optimal regimes.<n>Our approach outperforms existing methods in terms of both accuracy and sample efficiency.
arXiv Detail & Related papers (2025-06-27T14:21:03Z) - Bounded Rationality for LLMs: Satisficing Alignment at Inference-Time [52.230936493691985]
We propose SITAlign, an inference framework that addresses the multifaceted nature of alignment by maximizing a primary objective while satisfying threshold-based constraints on secondary criteria.<n>We provide theoretical insights by deriving sub-optimality bounds of our satisficing based inference alignment approach.
arXiv Detail & Related papers (2025-05-29T17:56:05Z) - Residual Feature Integration is Sufficient to Prevent Negative Transfer [16.047084318753377]
We propose Residual Feature Integration (REFINE), a simple yet effective method designed to mitigate negative transfer.<n>Our approach combines a fixed source-side representation with a trainable target-side encoder and fits a shallow neural network on the resulting joint representation.<n> Empirically, we show that REFINE consistently enhances performance across diverse application and data modalities.
arXiv Detail & Related papers (2025-05-17T00:36:59Z) - Optimal Policy Adaptation under Covariate Shift [15.703626346971182]
We propose principled approaches for learning the optimal policy in the target domain by leveraging two datasets.<n>We derive the identifiability assumptions for the reward induced by a given policy.<n>We then learn the optimal policy by optimizing the estimated reward.
arXiv Detail & Related papers (2025-01-14T12:33:02Z) - Provable Offline Preference-Based Reinforcement Learning [95.00042541409901]
We investigate the problem of offline Preference-based Reinforcement Learning (PbRL) with human feedback.
We consider the general reward setting where the reward can be defined over the whole trajectory.
We introduce a new single-policy concentrability coefficient, which can be upper bounded by the per-trajectory concentrability.
arXiv Detail & Related papers (2023-05-24T07:11:26Z) - Trust-Region-Free Policy Optimization for Stochastic Policies [60.52463923712565]
We show that the trust region constraint over policies can be safely substituted by a trust-region-free constraint without compromising the underlying monotonic improvement guarantee.
We call the resulting algorithm Trust-REgion-Free Policy Optimization (TREFree) explicit as it is free of any trust region constraints.
arXiv Detail & Related papers (2023-02-15T23:10:06Z) - Constrained Variational Policy Optimization for Safe Reinforcement
Learning [40.38842532850959]
Safe reinforcement learning aims to learn policies that satisfy certain constraints before deploying to safety-critical applications.
primal-dual as a prevalent constrained optimization framework suffers from instability issues and lacks optimality guarantees.
This paper overcomes the issues from a novel probabilistic inference perspective and proposes an Expectation-Maximization style approach to learn safe policy.
arXiv Detail & Related papers (2022-01-28T04:24:09Z) - Latent-Optimized Adversarial Neural Transfer for Sarcasm Detection [50.29565896287595]
We apply transfer learning to exploit common datasets for sarcasm detection.
We propose a generalized latent optimization strategy that allows different losses to accommodate each other.
In particular, we achieve 10.02% absolute performance gain over the previous state of the art on the iSarcasm dataset.
arXiv Detail & Related papers (2021-04-19T13:07:52Z) - Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds
Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria.
We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z) - Contradictory Structure Learning for Semi-supervised Domain Adaptation [67.89665267469053]
Current adversarial adaptation methods attempt to align the cross-domain features.
Two challenges remain unsolved: 1) the conditional distribution mismatch and 2) the bias of the decision boundary towards the source domain.
We propose a novel framework for semi-supervised domain adaptation by unifying the learning of opposite structures.
arXiv Detail & Related papers (2020-02-06T22:58:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.