Related papers: Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity

Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity

URL: http://arxiv.org/abs/2310.01616v2
Date: Tue, 28 May 2024 09:15:49 GMT
Title: Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity
Authors: Emmeran Johnson, Ciara Pike-Burke, Patrick Rebeschini,
Abstract summary: We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. We employ a learning framework that allows sending queries in $K$ batches, with feedback being processed and queries updated after each batch.
Score: 16.331196225467707
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries $n$ to the environment that is polynomial in the dimension $d$ of the problem. Adaptivity refers to the frequency at which queries are sent and feedback is processed to update the querying strategy. To investigate this interplay, we employ a learning framework that allows sending queries in $K$ batches, with feedback being processed and queries updated after each batch. This model encompasses the whole adaptivity spectrum, ranging from non-adaptive 'offline' ($K=1$) to fully adaptive ($K=n$) scenarios, and regimes in between. For the problems of policy evaluation and best-policy identification under $d$-dimensional linear function approximation, we establish $\Omega(\log \log d)$ lower bounds on the number of batches $K$ required for sample-efficient algorithms with $n = O(poly(d))$ queries. Our results show that just having adaptivity ($K>1$) does not necessarily guarantee sample-efficiency. Notably, the adaptivity-boundary for sample-efficiency is not between offline reinforcement learning ($K=1$), where sample-efficiency was known to not be possible, and adaptive settings. Instead, the boundary lies between different regimes of adaptivity and depends on the problem dimension.

Related papers

Tight Bounds for Answering Adaptively Chosen Concentrated Queries [24.243931410934323]
In this work, we prove that this utility gap is inherent under the current formulation of the concentrated queries framework.<n>We present a simplified version of the best-known algorithms that match our impossibility result.
arXiv Detail & Related papers (2025-07-18T07:08:42Z)
Efficient Context Selection for Long-Context QA: No Tuning, No Iteration, Just Adaptive-$k$ [5.026348938624301]
We present Adaptive-$k$, a simple and effective single-pass method that adaptively selects the number of passages based on the similarity scores between the query and the candidate passages.<n>It does not require model fine-tuning, extra LLM inferences or changes to existing retriever-reader pipelines.<n>It improves accuracy across five LCLMs and two embedding models, highlighting that dynamically adjusting context size leads to more efficient and accurate QA.
arXiv Detail & Related papers (2025-06-10T06:11:01Z)
Transfer Q Star: Principled Decoding for LLM Alignment [105.89114186982972]
Transfer $Q*$ estimates the optimal value function for a target reward $r$ through a baseline model. Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods.
arXiv Detail & Related papers (2024-05-30T21:36:12Z)
Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data [17.657917523817243]
We develop and analyze algorithms for instrumental variable regression by viewing the problem as a conditional optimization problem. In the context of least-squares instrumental variable regression, our algorithms neither require matrix inversions nor mini-batches. We derive rates of convergence in expectation, that are of order $mathcalO(log T/T)$ and $mathcalO (1/T1-iota)$ for any $iota>0$.
arXiv Detail & Related papers (2024-05-29T19:21:55Z)
Adaptivity Complexity for Causal Graph Discovery [7.424262881242935]
We study the problem of $r$-adaptivity, where the algorithm designer recovers the causal graph under a total of $r$ sequential rounds. We provide a $r$-adaptive algorithm that achieves $O(minr,log n cdot n1/minr,log n)$ approximation with respect to the verification number.
arXiv Detail & Related papers (2023-06-09T09:49:16Z)
Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning [33.590006101071765]
We consider the setting where a master wants to run a distributed descent (SGD) algorithm on $n$ workers. We show that the adaptive version of distributed SGD can reach lower error values in less time compared to non-adaptive implementations.
arXiv Detail & Related papers (2022-08-04T10:57:25Z)
An Experimental Design Perspective on Model-Based Reinforcement Learning [73.37942845983417]
In practical applications of RL, it is expensive to observe state transitions from the environment. We propose an acquisition function that quantifies how much information a state-action pair would provide about the optimal solution to a Markov decision process.
arXiv Detail & Related papers (2021-12-09T23:13:57Z)
How to Query An Oracle? Efficient Strategies to Label Data [59.89900843097016]
We consider the basic problem of querying an expert oracle for labeling a dataset in machine learning. We present a randomized batch algorithm that operates on a round-by-round basis to label the samples and achieves a query rate of $O(fracNk2)$. In addition, we present an adaptive greedy query scheme, which achieves an average rate of $approx 0.2N$ queries per sample with triplet queries.
arXiv Detail & Related papers (2021-10-05T20:15:35Z)
AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization [104.96004056928474]
We propose a class of faster adaptive gradient descent methods for non-strongly-concave minimax problems. We show that our method reaches a lower sample complexity of $O(kappa2.5epsilon-3)$ with the mini-batch size $O(kappa)$.
arXiv Detail & Related papers (2021-06-30T14:47:09Z)
Online Model Selection for Reinforcement Learning with Function Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret. We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z)
A No-Free-Lunch Theorem for MultiTask Learning [19.645741778058227]
We consider a seemingly favorable classification scenario where all tasks $P_t$ share a common optimal classifier $h*,$. We show that, even though such regimes admit minimax rates accounting for both $n$ and $N$, no adaptive algorithm exists.
arXiv Detail & Related papers (2020-06-29T03:03:29Z)
FANOK: Knockoffs in Linear Time [73.5154025911318]
We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems. We test our methods on problems with $p$ as large as $500,000$.
arXiv Detail & Related papers (2020-06-15T21:55:34Z)
Non-Adaptive Adaptive Sampling on Turnstile Streams [57.619901304728366]
We give the first relative-error algorithms for column subset selection, subspace approximation, projective clustering, and volume on turnstile streams that use space sublinear in $n$. Our adaptive sampling procedure has a number of applications to various data summarization problems that either improve state-of-the-art or have only been previously studied in the more relaxed row-arrival model.
arXiv Detail & Related papers (2020-04-23T05:00:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.