Related papers: Optimistically Tempered Online Learning

Optimistically Tempered Online Learning

URL: http://arxiv.org/abs/2301.07530v2
Date: Wed, 14 Feb 2024 15:16:47 GMT
Title: Optimistically Tempered Online Learning
Authors: Maxime Haddouche and Olivier Wintenberger and Benjamin Guedj
Abstract summary: Optimistic Online Learning algorithms exploit expert advices assumed optimistically to be always useful. We develop the emphoptimistically tempered (OT) online learning framework as well as OT adaptations of online algorithms. Our algorithms come with sound theoretical guarantees in the form of dynamic regret bounds.
Score: 19.12634663761194
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Optimistic Online Learning algorithms have been developed to exploit expert advices, assumed optimistically to be always useful. However, it is legitimate to question the relevance of such advices \emph{w.r.t.} the learning information provided by gradient-based online algorithms. In this work, we challenge the confidence assumption on the expert and develop the \emph{optimistically tempered} (OT) online learning framework as well as OT adaptations of online algorithms. Our algorithms come with sound theoretical guarantees in the form of dynamic regret bounds, and we eventually provide experimental validation of the usefulness of the OT approach.

Related papers

Online inductive learning from answer sets for efficient reinforcement learning exploration [52.03682298194168]
We exploit inductive learning of answer set programs to learn a set of logical rules representing an explainable approximation of the agent policy. We then perform answer set reasoning on the learned rules to guide the exploration of the learning agent at the next batch. Our methodology produces a significant boost in the discounted return achieved by the agent, even in the first batches of training.
arXiv Detail & Related papers (2025-01-13T16:13:22Z)
Learning-Augmented Algorithms for Online Concave Packing and Convex Covering Problems [4.9826534303287335]
We present learning-augmented algorithmic frameworks for two fundamental optimizations settings. For online packing with concave objectives, we present a simple but overarching strategy that switches between the advice and the state-of-the-art online algorithm. We show that our algorithms break impossibility results when the advice is accurate, while maintaining comparable performance with state-of-the-art classical online algorithms even when the advice is erroneous.
arXiv Detail & Related papers (2024-11-13T04:27:25Z)
Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions. Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms. We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z)
A Simple Learning-Augmented Algorithm for Online Packing with Concave Objectives [4.9826534303287335]
We introduce and analyze a simple learning-augmented algorithm for online packing problems with linear constraints and concave objectives. We further raise the problem of understanding necessary and sufficient conditions for when such simple black-box solutions may be optimal.
arXiv Detail & Related papers (2024-06-05T18:39:28Z)
RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z)
Efficient Methods for Non-stationary Online Learning [67.3300478545554]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$. Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods.
arXiv Detail & Related papers (2023-09-16T07:30:12Z)
Online Attentive Kernel-Based Temporal Difference Learning [13.94346725929798]
Online Reinforcement Learning (RL) has been receiving increasing attention due to its fast learning capability and improving data efficiency. Online RL often suffers from complex Value Function Approximation (VFA) and catastrophic interference. We propose an Online Attentive Kernel-Based Temporal Difference (OAKTD) algorithm using two-timescale optimization.
arXiv Detail & Related papers (2022-01-22T14:47:10Z)
Boosting for Online Convex Optimization [64.15578413206715]
We consider the decision-making framework of online convex optimization with a large number of experts. We define a weak learning algorithm as a mechanism that guarantees approximate regret against a base class of experts. We give an efficient boosting algorithm that guarantees near-optimal regret against the convex hull of the base class.
arXiv Detail & Related papers (2021-02-18T12:30:49Z)
Optimal Robustness-Consistency Trade-offs for Learning-Augmented Online Algorithms [85.97516436641533]
We study the problem of improving the performance of online algorithms by incorporating machine-learned predictions. The goal is to design algorithms that are both consistent and robust. We provide the first set of non-trivial lower bounds for competitive analysis using machine-learned predictions.
arXiv Detail & Related papers (2020-10-22T04:51:01Z)
Meta-Gradient Reinforcement Learning with an Objective Discovered Online [54.15180335046361]
We propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network. Because the objective is discovered online, it can adapt to changes over time. On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency.
arXiv Detail & Related papers (2020-07-16T16:17:09Z)
A Modern Introduction to Online Learning [15.974402990630402]
Online learning refers to the framework of minimization of regret under worst-case assumptions. I present first-order and second-order algorithms for online learning with convex losses.
arXiv Detail & Related papers (2019-12-31T08:16:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.