Sparse Optimistic Information Directed Sampling
- URL: http://arxiv.org/abs/2510.24234v1
- Date: Tue, 28 Oct 2025 09:42:15 GMT
- Title: Sparse Optimistic Information Directed Sampling
- Authors: Ludovic Schwartz, Hamish Flynn, Gergely Neu,
- Abstract summary: We show that SOIDS can optimally balance information and regret.<n>We show that SOIDS simultaneously achieves optimal worst-case regret in both the data-rich and data-poor regimes.
- Score: 11.986224119327387
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many high-dimensional online decision-making problems can be modeled as stochastic sparse linear bandits. Most existing algorithms are designed to achieve optimal worst-case regret in either the data-rich regime, where polynomial depen- dence on the ambient dimension is unavoidable, or the data-poor regime, where dimension-independence is possible at the cost of worse dependence on the num- ber of rounds. In contrast, the sparse Information Directed Sampling (IDS) algo- rithm satisfies a Bayesian regret bound that has the optimal rate in both regimes simultaneously. In this work, we explore the use of Sparse Optimistic Informa- tion Directed Sampling (SOIDS) to achieve the same adaptivity in the worst-case setting, without Bayesian assumptions. Through a novel analysis that enables the use of a time-dependent learning rate, we show that SOIDS can optimally balance information and regret. Our results extend the theoretical guarantees of IDS, pro- viding the first algorithm that simultaneously achieves optimal worst-case regret in both the data-rich and data-poor regimes. We empirically demonstrate the good performance of SOIDS.
Related papers
- Direct Preference Optimization with Rating Information: Practical Algorithms and Provable Gains [67.71020482405343]
We study how to design algorithms that can leverage additional information in the form of rating gap.<n>We present new algorithms that can achieve faster statistical rates than DPO in presence of accurate rating gap information.
arXiv Detail & Related papers (2026-01-31T08:38:21Z) - Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update [60.414548453838506]
We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function.<n>GLBs are widely applicable to real-world scenarios, but their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency.<n>We propose a jointly efficient algorithm that attains a nearly optimal regret bound with $mathcalO(1)$ time and space complexities per round.
arXiv Detail & Related papers (2025-07-16T02:24:21Z) - Optimizing Input Data Collection for Ranking and Selection [2.3708672042234213]
We design a sequential sampling algorithm that collects input and simulation data given a budget.<n>We show that MPB's posterior probability of optimality converges to one at an exponential rate as the sampling budget increases.<n>We extend OSAR by adopting the kernel ridge regression to improve the simulation output mean prediction.
arXiv Detail & Related papers (2025-02-23T17:33:43Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Align Your Steps: Optimizing Sampling Schedules in Diffusion Models [63.927438959502226]
Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond.
A crucial drawback of DMs is their slow sampling speed, relying on many sequential function evaluations through large neural networks.
We propose a general and principled approach to optimizing the sampling schedules of DMs for high-quality outputs.
arXiv Detail & Related papers (2024-04-22T18:18:41Z) - Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients.
Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z) - Mean Estimation with User-Level Privacy for Spatio-Temporal IoT Datasets [5.34194012533815]
We develop user-level differentially private algorithms to ensure low estimation errors on real-world datasets.
We test our algorithms on ITMS (Intelligent Traffic Management System) data from an Indian city.
We characterize the best performance of pseudo-user creation-based algorithms on worst-case datasets via a minimax approach.
arXiv Detail & Related papers (2024-01-29T06:21:29Z) - Debiasing In-Sample Policy Performance for Small-Data, Large-Scale
Optimization [4.554894288663752]
We propose a novel estimator of the out-of-sample performance of a policy in data-driven optimization.
Unlike cross-validation, our approach avoids sacrificing data for a test set.
We prove our estimator performs well in the small-data, largescale regime.
arXiv Detail & Related papers (2021-07-26T19:00:51Z) - Information Directed Sampling for Sparse Linear Bandits [42.232086950768476]
We develop a class of information-theoretic Bayesian regret bounds that nearly match existing lower bounds on a variety of problem instances.
Numerical results demonstrate significant regret reductions by sparse IDS relative to several baselines.
arXiv Detail & Related papers (2021-05-29T10:26:23Z) - Asymptotically Optimal Information-Directed Sampling [37.816004557470556]
We introduce a simple and efficient algorithm for linear bandits with finitely many actions that isally optimal and nearly worst-case optimal in finite time.
The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines lower bound.
arXiv Detail & Related papers (2020-11-11T18:01:59Z) - Bilevel Optimization for Differentially Private Optimization in Energy
Systems [53.806512366696275]
This paper studies how to apply differential privacy to constrained optimization problems whose inputs are sensitive.
The paper shows that, under a natural assumption, a bilevel model can be solved efficiently for large-scale nonlinear optimization problems.
arXiv Detail & Related papers (2020-01-26T20:15:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.