Online Linear Regression with Paid Stochastic Features
- URL: http://arxiv.org/abs/2511.08073v1
- Date: Wed, 12 Nov 2025 01:38:19 GMT
- Title: Online Linear Regression with Paid Stochastic Features
- Authors: Nadav Merlis, Kyoungseok Jang, Nicolò Cesa-Bianchi,
- Abstract summary: We study an online linear regression setting in which the observed feature vectors are corrupted by noise and the learner can pay to reduce the noise level.<n>We measure the learner's regret against the linear predictor minimizing a notion of loss that combines the prediction error and payment.
- Score: 30.17994138589359
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study an online linear regression setting in which the observed feature vectors are corrupted by noise and the learner can pay to reduce the noise level. In practice, this may happen for several reasons: for example, because features can be measured more accurately using more expensive equipment, or because data providers can be incentivized to release less private features. Assuming feature vectors are drawn i.i.d. from a fixed but unknown distribution, we measure the learner's regret against the linear predictor minimizing a notion of loss that combines the prediction error and payment. When the mapping between payments and noise covariance is known, we prove that the rate $\sqrt{T}$ is optimal for regret if logarithmic factors are ignored. When the noise covariance is unknown, we show that the optimal regret rate becomes of order $T^{2/3}$ (ignoring log factors). Our analysis leverages matrix martingale concentration, showing that the empirical loss uniformly converges to the expected one for all payments and linear predictors.
Related papers
- Optimal Unconstrained Self-Distillation in Ridge Regression: Strict Improvements, Precise Asymptotics, and One-Shot Tuning [61.07540493350384]
Self-distillation (SD) is the process of retraining a student on a mixture of ground-truth and the teacher's own predictions.<n>We show that for any prediction risk, the optimally mixed student improves upon the ridge teacher for every regularization level.<n>We propose a consistent one-shot tuning method to estimate $star$ without grid search, sample splitting, or refitting.
arXiv Detail & Related papers (2026-02-19T17:21:15Z) - Model-free Online Learning for the Kalman Filter: Forgetting Factor and Logarithmic Regret [2.313314525234138]
We consider the problem of online prediction for an unknown, non-explosive linear system.<n>With a known system model, the optimal predictor is the celebrated Kalman filter.<n>We tackle this problem by injecting an inductive bias into the regression model via exponential forgetting
arXiv Detail & Related papers (2025-05-13T21:49:56Z) - Refined Risk Bounds for Unbounded Losses via Transductive Priors [67.12679195076387]
We revisit the sequential variants of linear regression with the squared loss, classification problems with hinge loss, and logistic regression.<n>Our key tools are based on the exponential weights algorithm with carefully chosen transductive priors.
arXiv Detail & Related papers (2024-10-29T00:01:04Z) - Minimax Optimality in Contextual Dynamic Pricing with General Valuation Models [8.981637739384674]
We study contextual dynamic pricing, where a decision maker posts personalized prices based on observable contexts.<n>Each valuation is modeled as an unknown latent function of the context, corrupted by independent and identically distributed market noise.
arXiv Detail & Related papers (2024-06-24T23:43:56Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.<n>We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.<n>Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - A Regret-Variance Trade-Off in Online Learning [14.41667013062914]
We show how variance of predictions can be exploited in learning.
In online prediction with corrupted losses, we show that the effect of corruption on the regret can be compensated by a large variance.
We extend our results to the setting of online linear regression.
arXiv Detail & Related papers (2022-06-06T14:50:19Z) - No-Regret Learning with Unbounded Losses: The Case of Logarithmic
Pooling [12.933990572597583]
We focus on the fundamental and practical aggregation method known as logarithmic pooling.
We consider the problem of learning the best set of parameters in an online adversarial setting.
We present an algorithm that learns expert weights in a way that attains $O(sqrtT log T)$ expected regret.
arXiv Detail & Related papers (2022-02-22T22:27:25Z) - Towards Agnostic Feature-based Dynamic Pricing: Linear Policies vs
Linear Valuation with Unknown Noise [16.871660060209674]
We show an algorithm that achieves an $tildeO(Tfrac34)$ regret, and improve the best-known lower bound from $Omega(Tfrac35)$ to $tildeOmega(Tfrac23)$.
Results demonstrate that no-regret learning is possible for feature-based dynamic pricing under weak assumptions.
arXiv Detail & Related papers (2022-01-27T06:40:03Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Online nonparametric regression with Sobolev kernels [99.12817345416846]
We derive the regret upper bounds on the classes of Sobolev spaces $W_pbeta(mathcalX)$, $pgeq 2, beta>fracdp$.
The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $beta> fracd2$ or $p=infty$ these rates are (essentially) optimal.
arXiv Detail & Related papers (2021-02-06T15:05:14Z) - Fast Rates for the Regret of Offline Reinforcement Learning [69.23654172273085]
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinitehorizon discounted decision process (MDP)
We show that given any estimate for the optimal quality function $Q*$, the regret of the policy it defines converges at a rate given by the exponentiation of the $Q*$-estimate's pointwise convergence rate.
arXiv Detail & Related papers (2021-01-31T16:17:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.