Related papers: Tight Bounds for Answering Adaptively Chosen Concentrated Queries

Tight Bounds for Answering Adaptively Chosen Concentrated Queries

URL: http://arxiv.org/abs/2507.13700v1
Date: Fri, 18 Jul 2025 07:08:42 GMT
Title: Tight Bounds for Answering Adaptively Chosen Concentrated Queries
Authors: Emma Rapoport, Edith Cohen, Uri Stemmer,
Abstract summary: In this work, we prove that this utility gap is inherent under the current formulation of the concentrated queries framework.<n>We present a simplified version of the best-known algorithms that match our impossibility result.
Score: 24.243931410934323
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most work on adaptive data analysis assumes that samples in the dataset are independent. When correlations are allowed, even the non-adaptive setting can become intractable, unless some structural constraints are imposed. To address this, Bassily and Freund [2016] introduced the elegant framework of concentrated queries, which requires the analyst to restrict itself to queries that are concentrated around their expected value. While this assumption makes the problem trivial in the non-adaptive setting, in the adaptive setting it remains quite challenging. In fact, all known algorithms in this framework support significantly fewer queries than in the independent case: At most $O(n)$ queries for a sample of size $n$, compared to $O(n^2)$ in the independent setting. In this work, we prove that this utility gap is inherent under the current formulation of the concentrated queries framework, assuming some natural conditions on the algorithm. Additionally, we present a simplified version of the best-known algorithms that match our impossibility result.

Related papers

T$^2$: An Adaptive Test-Time Scaling Strategy for Contextual Question Answering [49.5489716597489]
We present T$2$: Think-to-Think, a novel framework that dynamically adapts reasoning depth based on question complexity.<n>T$2$ works through four key steps: decomposing questions into structural elements, generating similar examples with candidate reasoning strategies, evaluating these strategies against multiple criteria, and applying the most appropriate strategy to the original question.
arXiv Detail & Related papers (2025-05-23T03:18:02Z)
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA) We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity. We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z)
Optimal Multi-Distribution Learning [88.3008613028333]
Multi-distribution learning seeks to learn a shared model that minimizes the worst-case risk across $k$ distinct data distributions. We propose a novel algorithm that yields an varepsilon-optimal randomized hypothesis with a sample complexity on the order of (d+k)/varepsilon2.
arXiv Detail & Related papers (2023-12-08T16:06:29Z)
Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity [16.331196225467707]
We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. We employ a learning framework that allows sending queries in $K$ batches, with feedback being processed and queries updated after each batch.
arXiv Detail & Related papers (2023-10-02T20:14:01Z)
Subsampling Suffices for Adaptive Data Analysis [8.231050911072755]
Most classical techniques assume that the dataset is independent of the analyst's query and break down in the common setting where a dataset is reused for multiple, adaptively chosen, queries. We identify a remarkably simple set of assumptions under which the queries will continue to be representative even when chosen adaptively. The simplicity of this subsampling-based framework allows it to model a variety of real-world scenarios not covered by prior work.
arXiv Detail & Related papers (2023-02-17T02:47:54Z)
How to Query An Oracle? Efficient Strategies to Label Data [59.89900843097016]
We consider the basic problem of querying an expert oracle for labeling a dataset in machine learning. We present a randomized batch algorithm that operates on a round-by-round basis to label the samples and achieves a query rate of $O(fracNk2)$. In addition, we present an adaptive greedy query scheme, which achieves an average rate of $approx 0.2N$ queries per sample with triplet queries.
arXiv Detail & Related papers (2021-10-05T20:15:35Z)
Generalization in the Face of Adaptivity: A Bayesian Perspective [3.0202264016476623]
Repeated use of a data sample via adaptively chosen queries can rapidly lead to overfitting. It turns out that simple noise unbounded addition algorithms suffice to prevent this issue. We show that the harm of adaptivity results from the covariance between the new query and a Bayes factor-based measure of how much information about the data sample was encoded in the responses given to past queries.
arXiv Detail & Related papers (2021-06-20T22:06:44Z)
Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments [55.24895403089543]
Domain generalization aims at performing well on unseen test environments with data from a limited number of training environments. We present a new algorithm based on performing iterative feature matching that is guaranteed with high probability to yield a predictor that generalizes after seeing only $O(logd_s)$ environments.
arXiv Detail & Related papers (2021-06-18T04:39:19Z)
Fuzzy Clustering with Similarity Queries [56.96625809888241]
The fuzzy or soft objective is a popular generalization of the well-known $k$-means problem. We show that by making few queries, the problem becomes easier to solve.
arXiv Detail & Related papers (2021-06-04T02:32:26Z)
Distributionally Robust Weighted $k$-Nearest Neighbors [21.537952410507483]
Learning a robust classifier from a few samples remains a key challenge in machine learning. In this paper, we study a minimax distributionally robust formulation of weighted $k$-nearest neighbors. We develop an algorithm, textttDr.k-NN, that efficiently solves this functional optimization problem.
arXiv Detail & Related papers (2020-06-07T00:34:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.