Related papers: Sparse Optimistic Information Directed Sampling

Sparse Optimistic Information Directed Sampling

URL: http://arxiv.org/abs/2510.24234v1
Date: Tue, 28 Oct 2025 09:42:15 GMT
Title: Sparse Optimistic Information Directed Sampling
Authors: Ludovic Schwartz, Hamish Flynn, Gergely Neu,
Abstract summary: We show that SOIDS can optimally balance information and regret.<n>We show that SOIDS simultaneously achieves optimal worst-case regret in both the data-rich and data-poor regimes.
Score: 11.986224119327387
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many high-dimensional online decision-making problems can be modeled as stochastic sparse linear bandits. Most existing algorithms are designed to achieve optimal worst-case regret in either the data-rich regime, where polynomial depen- dence on the ambient dimension is unavoidable, or the data-poor regime, where dimension-independence is possible at the cost of worse dependence on the num- ber of rounds. In contrast, the sparse Information Directed Sampling (IDS) algo- rithm satisfies a Bayesian regret bound that has the optimal rate in both regimes simultaneously. In this work, we explore the use of Sparse Optimistic Informa- tion Directed Sampling (SOIDS) to achieve the same adaptivity in the worst-case setting, without Bayesian assumptions. Through a novel analysis that enables the use of a time-dependent learning rate, we show that SOIDS can optimally balance information and regret. Our results extend the theoretical guarantees of IDS, pro- viding the first algorithm that simultaneously achieves optimal worst-case regret in both the data-rich and data-poor regimes. We empirically demonstrate the good performance of SOIDS.

Related papers

Direct Preference Optimization with Rating Information: Practical Algorithms and Provable Gains [67.71020482405343]
We study how to design algorithms that can leverage additional information in the form of rating gap.<n>We present new algorithms that can achieve faster statistical rates than DPO in presence of accurate rating gap information.
arXiv Detail & Related papers (2026-01-31T08:38:21Z)
Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update [60.414548453838506]
We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function.<n>GLBs are widely applicable to real-world scenarios, but their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency.<n>We propose a jointly efficient algorithm that attains a nearly optimal regret bound with $mathcalO(1)$ time and space complexities per round.
arXiv Detail & Related papers (2025-07-16T02:24:21Z)
Optimizing Input Data Collection for Ranking and Selection [2.3708672042234213]
We design a sequential sampling algorithm that collects input and simulation data given a budget.<n>We show that MPB's posterior probability of optimality converges to one at an exponential rate as the sampling budget increases.<n>We extend OSAR by adopting the kernel ridge regression to improve the simulation output mean prediction.
arXiv Detail & Related papers (2025-02-23T17:33:43Z)
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z)
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models [63.927438959502226]
Diffusion models (DMs) have established themselves as the state-of-the-art generative modeling approach in the visual domain and beyond. A crucial drawback of DMs is their slow sampling speed, relying on many sequential function evaluations through large neural networks. We propose a general and principled approach to optimizing the sampling schedules of DMs for high-quality outputs.
arXiv Detail & Related papers (2024-04-22T18:18:41Z)
Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients. Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z)
Mean Estimation with User-Level Privacy for Spatio-Temporal IoT Datasets [5.34194012533815]
We develop user-level differentially private algorithms to ensure low estimation errors on real-world datasets. We test our algorithms on ITMS (Intelligent Traffic Management System) data from an Indian city. We characterize the best performance of pseudo-user creation-based algorithms on worst-case datasets via a minimax approach.
arXiv Detail & Related papers (2024-01-29T06:21:29Z)
Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization [4.554894288663752]
We propose a novel estimator of the out-of-sample performance of a policy in data-driven optimization. Unlike cross-validation, our approach avoids sacrificing data for a test set. We prove our estimator performs well in the small-data, largescale regime.
arXiv Detail & Related papers (2021-07-26T19:00:51Z)
Information Directed Sampling for Sparse Linear Bandits [42.232086950768476]
We develop a class of information-theoretic Bayesian regret bounds that nearly match existing lower bounds on a variety of problem instances. Numerical results demonstrate significant regret reductions by sparse IDS relative to several baselines.
arXiv Detail & Related papers (2021-05-29T10:26:23Z)
Asymptotically Optimal Information-Directed Sampling [37.816004557470556]
We introduce a simple and efficient algorithm for linear bandits with finitely many actions that isally optimal and nearly worst-case optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines lower bound.
arXiv Detail & Related papers (2020-11-11T18:01:59Z)
Bilevel Optimization for Differentially Private Optimization in Energy Systems [53.806512366696275]
This paper studies how to apply differential privacy to constrained optimization problems whose inputs are sensitive. The paper shows that, under a natural assumption, a bilevel model can be solved efficiently for large-scale nonlinear optimization problems.
arXiv Detail & Related papers (2020-01-26T20:15:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.