On the Value of Stochastic Side Information in Online Learning
- URL: http://arxiv.org/abs/2303.05914v1
- Date: Thu, 9 Mar 2023 15:06:07 GMT
- Title: On the Value of Stochastic Side Information in Online Learning
- Authors: Junzhang Jia, Xuetong Wu, Jingge Zhu, and Jamie Evans
- Abstract summary: We study the effectiveness of side information in deterministic online learning scenarios.
We assume that certain side information is available to the forecaster but not the experts.
- Score: 3.4788711710826083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the effectiveness of stochastic side information in deterministic
online learning scenarios. We propose a forecaster to predict a deterministic
sequence where its performance is evaluated against an expert class. We assume
that certain stochastic side information is available to the forecaster but not
the experts. We define the minimax expected regret for evaluating the
forecasters performance, for which we obtain both upper and lower bounds.
Consequently, our results characterize the improvement in the regret due to the
stochastic side information. Compared with the classical online learning
problem with regret scales with O(\sqrt(n)), the regret can be negative when
the stochastic side information is more powerful than the experts. To
illustrate, we apply the proposed bounds to two concrete examples of different
types of side information.
Related papers
- Asymptotically Optimal Regret for Black-Box Predict-then-Optimize [7.412445894287709]
We study new black-box predict-then-optimize problems that lack special structure and where we only observe the reward from the action taken.
We present a novel loss function, which we call Empirical Soft Regret (ESR), designed to significantly improve reward when used in training.
We also show our approach significantly outperforms state-of-the-art algorithms on real-world decision-making problems in news recommendation and personalized healthcare.
arXiv Detail & Related papers (2024-06-12T04:46:23Z) - Adversarial Resilience in Sequential Prediction via Abstention [46.80218090768711]
We study the problem of sequential prediction in the setting with an adversary that is allowed to inject clean-label adversarial examples.
We propose a new model of sequential prediction that sits between the purely and fully adversarial settings.
arXiv Detail & Related papers (2023-06-22T17:44:22Z) - The Statistical Benefits of Quantile Temporal-Difference Learning for
Value Estimation [53.53493178394081]
We analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD)
Even if a practitioner has no interest in the return distribution beyond the mean, QTD may offer performance superior to approaches such as classical TD learning.
arXiv Detail & Related papers (2023-05-28T10:52:46Z) - Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters.
EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z) - A Regret-Variance Trade-Off in Online Learning [14.41667013062914]
We show how variance of predictions can be exploited in learning.
In online prediction with corrupted losses, we show that the effect of corruption on the regret can be compensated by a large variance.
We extend our results to the setting of online linear regression.
arXiv Detail & Related papers (2022-06-06T14:50:19Z) - Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR)
CPR achieves unbiased recommendation without knowing the exposure mechanism.
We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z) - The Impact of Batch Learning in Stochastic Bandits [5.008064542274928]
We consider a special case of bandit problems, namely batched bandits.
Motivated by natural restrictions of recommender systems and e-commerce platforms, we assume that a learning agent observes responses batched in groups over a certain time period.
We provide a policy-agnostic regret analysis and demonstrate upper and lower bounds for the regret of a candidate policy.
arXiv Detail & Related papers (2021-11-03T08:38:10Z) - Balanced Q-learning: Combining the Influence of Optimistic and
Pessimistic Targets [74.04426767769785]
We show that specific types of biases may be preferable, depending on the scenario.
We design a novel reinforcement learning algorithm, Balanced Q-learning, in which the target is modified to be a convex combination of a pessimistic and an optimistic term.
arXiv Detail & Related papers (2021-11-03T07:30:19Z) - Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training.
At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.