Related papers: No-Regret Learning with Unbounded Losses: The Case of Logarithmic Pooling

No-Regret Learning with Unbounded Losses: The Case of Logarithmic Pooling

URL: http://arxiv.org/abs/2202.11219v2
Date: Tue, 10 Oct 2023 01:26:08 GMT
Title: No-Regret Learning with Unbounded Losses: The Case of Logarithmic Pooling
Authors: Eric Neyman and Tim Roughgarden
Abstract summary: We focus on the fundamental and practical aggregation method known as logarithmic pooling. We consider the problem of learning the best set of parameters in an online adversarial setting. We present an algorithm that learns expert weights in a way that attains $O(sqrtT log T)$ expected regret.
Score: 12.933990572597583
License: http://creativecommons.org/licenses/by/4.0/
Abstract: For each of $T$ time steps, $m$ experts report probability distributions over $n$ outcomes; we wish to learn to aggregate these forecasts in a way that attains a no-regret guarantee. We focus on the fundamental and practical aggregation method known as logarithmic pooling -- a weighted average of log odds -- which is in a certain sense the optimal choice of pooling method if one is interested in minimizing log loss (as we take to be our loss function). We consider the problem of learning the best set of parameters (i.e. expert weights) in an online adversarial setting. We assume (by necessity) that the adversarial choices of outcomes and forecasts are consistent, in the sense that experts report calibrated forecasts. Imposing this constraint creates a (to our knowledge) novel semi-adversarial setting in which the adversary retains a large amount of flexibility. In this setting, we present an algorithm based on online mirror descent that learns expert weights in a way that attains $O(\sqrt{T} \log T)$ expected regret as compared with the best weights in hindsight.

Related papers

Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning. One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems. We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z)
Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z)
Streaming Algorithms for Learning with Experts: Deterministic Versus Robust [62.98860182111096]
In the online learning with experts problem, an algorithm must make a prediction about an outcome on each of $T$ days (or times) The goal is to make a prediction with the minimum cost, specifically compared to the best expert in the set. We show a space lower bound of $widetildeOmegaleft(fracnMRTright)$ for any deterministic algorithm that achieves regret $R$ when the best expert makes $M$ mistakes.
arXiv Detail & Related papers (2023-03-03T04:39:53Z)
Private Online Prediction from Experts: Separations and Faster Rates [74.52487417350221]
Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints. We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries.
arXiv Detail & Related papers (2022-10-24T18:40:19Z)
A Regret-Variance Trade-Off in Online Learning [14.41667013062914]
We show how variance of predictions can be exploited in learning. In online prediction with corrupted losses, we show that the effect of corruption on the regret can be compensated by a large variance. We extend our results to the setting of online linear regression.
arXiv Detail & Related papers (2022-06-06T14:50:19Z)
The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL) We focus on learning with general function classes and general model classes. We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z)
Tight Bounds on Minimax Regret under Logarithmic Loss via Self-Concordance [37.0414602993676]
We show that for any expert class with (sequential) metric entropy $mathcalO(gamma-p)$ at scale $gamma$, the minimax regret is $mathcalO(np/(p+1))$. As an application of our techniques, we resolve the minimax regret for nonparametric Lipschitz classes of experts.
arXiv Detail & Related papers (2020-07-02T14:47:33Z)
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs) We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model. Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
Debiased Off-Policy Evaluation for Recommendation Systems [8.63711086812655]
A/B tests are reliable, but are time- and money-consuming, and entail a risk of failure. We develop an alternative method, which predicts the performance of algorithms given historical data. Our method produces smaller mean squared errors than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-20T02:30:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.