On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage
- URL: http://arxiv.org/abs/2602.12107v1
- Date: Thu, 12 Feb 2026 15:59:42 GMT
- Title: On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage
- Authors: Haolin Liu, Braham Snyder, Chen-Yu Wei,
- Abstract summary: We study offline reinforcement learning under $Qstar$-approximation and partial coverage.<n>We introduce a general framework that characterizes the intrinsic complexity of a given $Qstar$ function class.<n>We also give the first characterization of offline learnability for general low-Bellman-rank MDPs without Bellman completeness.
- Score: 26.28492097543273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study offline reinforcement learning under $Q^\star$-approximation and partial coverage, a setting that motivates practical algorithms such as Conservative $Q$-Learning (CQL; Kumar et al., 2020) but has received limited theoretical attention. Our work is inspired by the following open question: "Are $Q^\star$-realizability and Bellman completeness sufficient for sample-efficient offline RL under partial coverage?" We answer in the negative by establishing an information-theoretic lower bound. Going substantially beyond this, we introduce a general framework that characterizes the intrinsic complexity of a given $Q^\star$ function class, inspired by model-free decision-estimation coefficients (DEC) for online RL (Foster et al., 2023b; Liu et al., 2025b). This complexity recovers and improves the quantities underlying the guarantees of Chen and Jiang (2022) and Uehara et al. (2023), and extends to broader settings. Our decision-estimation decomposition can be combined with a wide range of $Q^\star$ estimation procedures, modularizing and generalizing existing approaches. Beyond the general framework, we make further contributions: By developing a novel second-order performance difference lemma, we obtain the first $ε^{-2}$ sample complexity under partial coverage for soft $Q$-learning, improving the $ε^{-4}$ bound of Uehara et al. (2023). We remove Chen and Jiang's (2022) need for additional online interaction when the value gap of $Q^\star$ is unknown. We also give the first characterization of offline learnability for general low-Bellman-rank MDPs without Bellman completeness (Jiang et al., 2017; Du et al., 2021; Jin et al., 2021), a canonical setting in online RL that remains unexplored in offline RL except for special cases. Finally, we provide the first analysis for CQL under $Q^\star$-realizability and Bellman completeness beyond the tabular case.
Related papers
- Linear $Q$-Learning Does Not Diverge in $L^2$: Convergence Rates to a Bounded Set [34.129520133741124]
This paper further establishes the first $L2$ convergence rate of linear $Q$-learning iterates (to a bounded set)<n>All we need is an $epsilon$-softmax behavior policy with an adaptive temperature.
arXiv Detail & Related papers (2025-01-31T16:10:50Z) - A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation [67.66904892192794]
We propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for reinforcement learning.<n>MQL-UCB achieves minimax optimal regret of $tildeO(dsqrtHK)$ when $K$ is sufficiently large and near-optimal policy switching cost.<n>Our work sheds light on designing provably sample-efficient and deployment-efficient Q-learning with nonlinear function approximation.
arXiv Detail & Related papers (2023-11-26T08:31:57Z) - Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.<n>We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z) - A General Framework for Sequential Decision-Making under Adaptivity
Constraints [112.79808969568253]
We study general sequential decision-making under two adaptivity constraints: rare policy switch and batch learning.
For the rare policy switch constraint, we provide an algorithm to achieve a $widetildemathcalO(sqrtK+K/B)$ regret with the number of batches.
For the batch learning constraint, we provide an algorithm that provides a $widetildemathcalO(sqrtK+K/B)$ regret with the number of batches.
arXiv Detail & Related papers (2023-06-26T07:20:25Z) - A Finite Sample Complexity Bound for Distributionally Robust Q-learning [17.96094201655567]
We consider a reinforcement learning setting in which the deployment environment is different from the training environment.
Applying a robust Markov decision processes formulation, we extend the distributionally robust $Q$-learning framework studied in Liu et al.
This is the first sample complexity result for the model-free robust RL problem.
arXiv Detail & Related papers (2023-02-26T01:15:32Z) - Settling the Sample Complexity of Model-Based Offline Reinforcement
Learning [50.5790774201146]
offline reinforcement learning (RL) learns using pre-collected data without further exploration.
Prior algorithms or analyses either suffer from suboptimal sample complexities or incur high burn-in cost to reach sample optimality.
We demonstrate that the model-based (or "plug-in") approach achieves minimax-optimal sample complexity without burn-in cost.
arXiv Detail & Related papers (2022-04-11T17:26:19Z) - Online Sub-Sampling for Reinforcement Learning with General Function
Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm.
For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times.
In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z) - Private Learning of Halfspaces: Simplifying the Construction and
Reducing the Sample Complexity [63.29100726064574]
We present a differentially private learner for halfspaces over a finite grid $G$ in $mathbbRd$ with sample complexity $approx d2.5cdot 2log*|G|$.
The building block for our learner is a new differentially private algorithm for approximately solving the linear feasibility problem.
arXiv Detail & Related papers (2020-04-16T16:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.