Batch Value-function Approximation with Only Realizability
- URL: http://arxiv.org/abs/2008.04990v3
- Date: Thu, 17 Jun 2021 04:41:20 GMT
- Title: Batch Value-function Approximation with Only Realizability
- Authors: Tengyang Xie, Nan Jiang
- Abstract summary: We make progress in a long-standing problem of batch reinforcement learning (RL): learning $Qstar$ from an exploratory dataset.
Our algorithm, BVFT, breaks the hardness conjecture (albeit under a stronger notion of exploratory data) via a tournament procedure.
We also discuss how BVFT can be applied to model selection among other extensions and open problems.
- Score: 17.692408242465763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We make progress in a long-standing problem of batch reinforcement learning
(RL): learning $Q^\star$ from an exploratory and polynomial-sized dataset,
using a realizable and otherwise arbitrary function class. In fact, all
existing algorithms demand function-approximation assumptions stronger than
realizability, and the mounting negative evidence has led to a conjecture that
sample-efficient learning is impossible in this setting (Chen and Jiang, 2019).
Our algorithm, BVFT, breaks the hardness conjecture (albeit under a stronger
notion of exploratory data) via a tournament procedure that reduces the
learning problem to pairwise comparison, and solves the latter with the help of
a state-action partition constructed from the compared functions. We also
discuss how BVFT can be applied to model selection among other extensions and
open problems.
Related papers
- BatchGFN: Generative Flow Networks for Batch Active Learning [80.73649229919454]
BatchGFN is a novel approach for pool-based active learning that uses generative flow networks to sample sets of data points proportional to a batch reward.
We show our approach enables principled sampling near-optimal utility batches at inference time with a single forward pass per point in the batch in toy regression problems.
arXiv Detail & Related papers (2023-06-26T20:41:36Z) - Agnostic Multi-Robust Learning Using ERM [19.313739782029185]
A fundamental problem in robust learning is asymmetry: a learner needs to correctly classify every one of exponentially-many perturbations that an adversary might make to a test-time natural example.
In contrast, the attacker only needs to find one successful perturbation.
We introduce a novel multi-group setting and introduce a novel multi-robust learning problem.
arXiv Detail & Related papers (2023-03-15T21:30:14Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Recursive Causal Structure Learning in the Presence of Latent Variables
and Selection Bias [27.06618125828978]
We consider the problem of learning the causal MAG of a system from observational data in the presence of latent variables and selection bias.
We propose a novel computationally efficient constraint-based method that is sound and complete.
We provide experimental results to compare the proposed approach with the state of the art on both synthetic and real-world structures.
arXiv Detail & Related papers (2021-10-22T19:49:59Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Neural Active Learning with Performance Guarantees [37.16062387461106]
We investigate the problem of active learning in the streaming setting in non-parametric regimes, where the labels are generated from a class of functions on which we make no assumptions whatsoever.
We rely on recently proposed Neural Tangent Kernel (NTK) approximation tools to construct a suitable neural embedding that determines the feature space the algorithm operates on and the learned model computed atop.
arXiv Detail & Related papers (2021-06-06T20:44:23Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - Multi-task Supervised Learning via Cross-learning [102.64082402388192]
We consider a problem known as multi-task learning, consisting of fitting a set of regression functions intended for solving different tasks.
In our novel formulation, we couple the parameters of these functions, so that they learn in their task specific domains while staying close to each other.
This facilitates cross-fertilization in which data collected across different domains help improving the learning performance at each other task.
arXiv Detail & Related papers (2020-10-24T21:35:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.