Related papers: Minimum mean-squared error estimation with bandit feedback

Minimum mean-squared error estimation with bandit feedback

URL: http://arxiv.org/abs/2203.16810v4
Date: Fri, 02 May 2025 12:23:05 GMT
Title: Minimum mean-squared error estimation with bandit feedback
Authors: Ayon Ghosh, L. A. Prashanth, Dipayan Sen, Aditya Gopalan,
Abstract summary: We consider the problem of sequentially learning to estimate, in the mean squared error (MSE) sense.<n>We propose two MSE estimators, and analyze their concentration properties.
Score: 10.660855209170586
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of sequentially learning to estimate, in the mean squared error (MSE) sense, a Gaussian $K$-vector of unknown covariance by observing only $m < K$ of its entries in each round. We propose two MSE estimators, and analyze their concentration properties. The first estimator is non-adaptive, as it is tied to a predetermined $m$-subset and lacks the flexibility to transition to alternative subsets. The second estimator, which is derived using a regression framework, is adaptive and exhibits better concentration bounds in comparison to the first estimator. We frame the MSE estimation problem with bandit feedback, where the objective is to find the MSE-optimal subset with high confidence. We propose a variant of the successive elimination algorithm to solve this problem. We also derive a minimax lower bound to understand the fundamental limit on the sample complexity of this problem.

Related papers

Minimax Rate-Optimal Algorithms for High-Dimensional Stochastic Linear Bandits [1.2010968598596632]
We study the linear bandit problem with multiple arms over $T$ rounds.<n>We show that Lasso estimators are provably suboptimal in the sequential setting.<n>We propose a three-stage arm selection algorithm that uses thresholded Lasso as the main estimation method.
arXiv Detail & Related papers (2025-05-23T02:20:00Z)
Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems. We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z)
GROS: A General Robust Aggregation Strategy [49.1574468325115]
A new, very general, robust procedure for combining estimators in metric spaces is introduced. We show that the same (up to a constant) sub-Gaussianity is obtained if the minimization is taken over the sample. The performance of GROS is evaluated through five simulation studies.
arXiv Detail & Related papers (2024-02-23T17:00:32Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Optimality in Mean Estimation: Beyond Worst-Case, Beyond Sub-Gaussian, and Beyond $1+\alpha$ Moments [10.889739958035536]
We introduce a new definitional framework to analyze the fine-grained optimality of algorithms. We show that median-of-means is neighborhood optimal, up to constant factors. It is open to find a neighborhood-separated estimator without constant factor slackness.
arXiv Detail & Related papers (2023-11-21T18:50:38Z)
Minimum-Risk Recalibration of Classifiers [9.31067660373791]
We introduce the concept of minimum-risk recalibration within the framework of mean-squared-error decomposition. We show that transferring a calibrated classifier requires significantly fewer target samples compared to recalibrating from scratch.
arXiv Detail & Related papers (2023-05-18T11:27:02Z)
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise. Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret. We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z)
Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments [48.96971760679639]
We study variance-dependent regret bounds for Markov decision processes (MDPs) We propose two new environment norms to characterize the fine-grained variance properties of the environment. For model-based methods, we design a variant of the MVP algorithm. In particular, this bound is simultaneously minimax optimal for both and deterministic MDPs.
arXiv Detail & Related papers (2023-01-31T06:54:06Z)
Distributed Sparse Regression via Penalization [5.990069843501885]
We study linear regression over a network of agents, modeled as an undirected graph (with no centralized node) The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint. We show that the proximal-gradient algorithm applied to the penalized problem converges linearly up to a tolerance of the order of the centralized statistical error.
arXiv Detail & Related papers (2021-11-12T01:51:50Z)
Direct Measure Matching for Crowd Counting [59.66286603624411]
We propose a new measure-based counting approach to regress the predicted density maps to the scattered point-annotated ground truth directly. In this paper, we derive a semi-balanced form of Sinkhorn divergence, based on which a Sinkhorn counting loss is designed for measure matching.
arXiv Detail & Related papers (2021-07-04T06:37:33Z)
Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes. Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
SNIPS: Solving Noisy Inverse Problems Stochastically [25.567566997688044]
We introduce a novel algorithm dubbed SNIPS, which draws samples from the posterior distribution of any linear inverse problem. Our solution incorporates ideas from Langevin dynamics and Newton's method, and exploits a pre-trained minimum mean squared error (MMSE) We show that the samples produced are sharp, detailed and consistent with the given measurements, and their diversity exposes the inherent uncertainty in the inverse problem being solved.
arXiv Detail & Related papers (2021-05-31T13:33:21Z)
An Empirical Process Approach to the Union Bound: Practical Algorithms for Combinatorial and Linear Bandits [34.06611065493047]
This paper proposes near-optimal algorithms for the pure-exploration linear bandit problem in the fixed confidence and fixed budget settings. We provide an algorithm whose sample complexity scales with the geometry of the instance and avoids an explicit union bound over the number of arms. We also propose the first algorithm for linear bandits in the the fixed budget setting.
arXiv Detail & Related papers (2020-06-21T00:56:33Z)
Learning Minimax Estimators via Online Learning [55.92459567732491]
We consider the problem of designing minimax estimators for estimating parameters of a probability distribution. We construct an algorithm for finding a mixed-case Nash equilibrium.
arXiv Detail & Related papers (2020-06-19T22:49:42Z)
Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation [4.817429789586127]
It is not possible to obtain a Hoeffding bound on the error sequence, even when the underlying Markov chain is reversible and geometrically ergodic. It is shown that mean square error achieves the optimal rate of $O(1/n)$, subject to conditions on the step-size sequence.
arXiv Detail & Related papers (2020-02-07T01:52:21Z)
Thompson Sampling Algorithms for Mean-Variance Bandits [97.43678751629189]
We develop Thompson Sampling-style algorithms for mean-variance MAB. We also provide comprehensive regret analyses for Gaussian and Bernoulli bandits. Our algorithms significantly outperform existing LCB-based algorithms for all risk tolerances.
arXiv Detail & Related papers (2020-02-01T15:33:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.