No-Regret Learning with Unbounded Losses: The Case of Logarithmic
Pooling
- URL: http://arxiv.org/abs/2202.11219v2
- Date: Tue, 10 Oct 2023 01:26:08 GMT
- Title: No-Regret Learning with Unbounded Losses: The Case of Logarithmic
Pooling
- Authors: Eric Neyman and Tim Roughgarden
- Abstract summary: We focus on the fundamental and practical aggregation method known as logarithmic pooling.
We consider the problem of learning the best set of parameters in an online adversarial setting.
We present an algorithm that learns expert weights in a way that attains $O(sqrtT log T)$ expected regret.
- Score: 12.933990572597583
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For each of $T$ time steps, $m$ experts report probability distributions over
$n$ outcomes; we wish to learn to aggregate these forecasts in a way that
attains a no-regret guarantee. We focus on the fundamental and practical
aggregation method known as logarithmic pooling -- a weighted average of log
odds -- which is in a certain sense the optimal choice of pooling method if one
is interested in minimizing log loss (as we take to be our loss function). We
consider the problem of learning the best set of parameters (i.e. expert
weights) in an online adversarial setting. We assume (by necessity) that the
adversarial choices of outcomes and forecasts are consistent, in the sense that
experts report calibrated forecasts. Imposing this constraint creates a (to our
knowledge) novel semi-adversarial setting in which the adversary retains a
large amount of flexibility. In this setting, we present an algorithm based on
online mirror descent that learns expert weights in a way that attains
$O(\sqrt{T} \log T)$ expected regret as compared with the best weights in
hindsight.
Related papers
- Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.
We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.
Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Streaming Algorithms for Learning with Experts: Deterministic Versus
Robust [62.98860182111096]
In the online learning with experts problem, an algorithm must make a prediction about an outcome on each of $T$ days (or times)
The goal is to make a prediction with the minimum cost, specifically compared to the best expert in the set.
We show a space lower bound of $widetildeOmegaleft(fracnMRTright)$ for any deterministic algorithm that achieves regret $R$ when the best expert makes $M$ mistakes.
arXiv Detail & Related papers (2023-03-03T04:39:53Z) - Private Online Prediction from Experts: Separations and Faster Rates [74.52487417350221]
Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints.
We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries.
arXiv Detail & Related papers (2022-10-24T18:40:19Z) - A Regret-Variance Trade-Off in Online Learning [14.41667013062914]
We show how variance of predictions can be exploited in learning.
In online prediction with corrupted losses, we show that the effect of corruption on the regret can be compensated by a large variance.
We extend our results to the setting of online linear regression.
arXiv Detail & Related papers (2022-06-06T14:50:19Z) - The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret
and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL)
We focus on learning with general function classes and general model classes.
We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z) - Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback.
Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z) - Tight Bounds on Minimax Regret under Logarithmic Loss via
Self-Concordance [37.0414602993676]
We show that for any expert class with (sequential) metric entropy $mathcalO(gamma-p)$ at scale $gamma$, the minimax regret is $mathcalO(np/(p+1))$.
As an application of our techniques, we resolve the minimax regret for nonparametric Lipschitz classes of experts.
arXiv Detail & Related papers (2020-07-02T14:47:33Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z) - Debiased Off-Policy Evaluation for Recommendation Systems [8.63711086812655]
A/B tests are reliable, but are time- and money-consuming, and entail a risk of failure.
We develop an alternative method, which predicts the performance of algorithms given historical data.
Our method produces smaller mean squared errors than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-20T02:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.