Optimistically Tempered Online Learning
- URL: http://arxiv.org/abs/2301.07530v2
- Date: Wed, 14 Feb 2024 15:16:47 GMT
- Title: Optimistically Tempered Online Learning
- Authors: Maxime Haddouche and Olivier Wintenberger and Benjamin Guedj
- Abstract summary: Optimistic Online Learning algorithms exploit expert advices assumed optimistically to be always useful.
We develop the emphoptimistically tempered (OT) online learning framework as well as OT adaptations of online algorithms.
Our algorithms come with sound theoretical guarantees in the form of dynamic regret bounds.
- Score: 19.12634663761194
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Optimistic Online Learning algorithms have been developed to exploit expert
advices, assumed optimistically to be always useful. However, it is legitimate
to question the relevance of such advices \emph{w.r.t.} the learning
information provided by gradient-based online algorithms. In this work, we
challenge the confidence assumption on the expert and develop the
\emph{optimistically tempered} (OT) online learning framework as well as OT
adaptations of online algorithms. Our algorithms come with sound theoretical
guarantees in the form of dynamic regret bounds, and we eventually provide
experimental validation of the usefulness of the OT approach.
Related papers
- Optimized projection-free algorithms for online learning: construction and worst-case analysis [16.086904272719593]
This work studies and develop projection-free algorithms for online learning with linear optimization oracles (a.k.a. Frank-Wolfe)<n>We show how to leverage semidefinite programming to jointly design and analyze online Frank-Wolfe-type algorithms numerically.
arXiv Detail & Related papers (2025-06-06T08:22:20Z) - Online Decision-Focused Learning [74.3205104323777]
Decision-focused learning (DFL) is an increasingly popular paradigm for training models whose predictive outputs are used in decision-making tasks.<n>In this paper, we regularize the objective function to make it different and investigate how to overcome nonoptimality function.<n>We also showcase the effectiveness of our algorithms on a knapsack experiment, where they outperform two standard benchmarks.
arXiv Detail & Related papers (2025-05-19T10:40:30Z) - Online inductive learning from answer sets for efficient reinforcement learning exploration [52.03682298194168]
We exploit inductive learning of answer set programs to learn a set of logical rules representing an explainable approximation of the agent policy.
We then perform answer set reasoning on the learned rules to guide the exploration of the learning agent at the next batch.
Our methodology produces a significant boost in the discounted return achieved by the agent, even in the first batches of training.
arXiv Detail & Related papers (2025-01-13T16:13:22Z) - Learning-Augmented Algorithms for Online Concave Packing and Convex Covering Problems [4.9826534303287335]
We present learning-augmented algorithmic frameworks for two fundamental optimizations settings.
For online packing with concave objectives, we present a simple but overarching strategy that switches between the advice and the state-of-the-art online algorithm.
We show that our algorithms break impossibility results when the advice is accurate, while maintaining comparable performance with state-of-the-art classical online algorithms even when the advice is erroneous.
arXiv Detail & Related papers (2024-11-13T04:27:25Z) - Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - A Simple Learning-Augmented Algorithm for Online Packing with Concave Objectives [4.9826534303287335]
We introduce and analyze a simple learning-augmented algorithm for online packing problems with linear constraints and concave objectives.
We further raise the problem of understanding necessary and sufficient conditions for when such simple black-box solutions may be optimal.
arXiv Detail & Related papers (2024-06-05T18:39:28Z) - Bayesian Design Principles for Offline-to-Online Reinforcement Learning [50.97583504192167]
offline-to-online fine-tuning is crucial for real-world applications where exploration can be costly or unsafe.
In this paper, we tackle the dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop.
We show that Bayesian design principles are crucial in solving such a dilemma.
arXiv Detail & Related papers (2024-05-31T16:31:07Z) - Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms [23.61332577985059]
Inverse reinforcement learning (IRL) aims to recover the reward function of an expert agent from demonstrations of behavior.
This paper introduces a novel notion of feasible reward set capturing the opportunities and limitations of the offline setting.
arXiv Detail & Related papers (2024-02-23T15:49:46Z) - SPRINQL: Sub-optimal Demonstrations driven Offline Imitation Learning [11.666700714916065]
offline imitation learning (IL) aims to mimic an expert's behavior using demonstrations without any interaction with the environment.
We propose an offline IL approach that leverages the larger set of sub-optimal demonstrations while effectively mimicking expert trajectories.
Our approach, which is based on inverse soft-Q learning, learns from both expert and sub-optimal demonstrations.
arXiv Detail & Related papers (2024-02-20T17:02:48Z) - Discounted Adaptive Online Learning: Towards Better Regularization [5.5899168074961265]
We study online learning in adversarial nonstationary environments.
We propose an adaptive (i.e., instance optimal) algorithm that improves the widespread non-adaptive baseline.
We also consider the (Gibbs and Candes, 2021)-style online conformal prediction problem.
arXiv Detail & Related papers (2024-02-05T04:29:39Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Uncertainty-driven Exploration Strategies for Online Grasp Learning [43.88491290121489]
We present an uncertainty-based approach for online learning of grasp predictions for robotic bin picking.
Specifically, the online learning algorithm with an effective exploration strategy can significantly improve its adaptation performance to unseen environment settings.
arXiv Detail & Related papers (2023-09-21T13:06:03Z) - Efficient Methods for Non-stationary Online Learning [67.3300478545554]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$.
Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods.
arXiv Detail & Related papers (2023-09-16T07:30:12Z) - Bayesian Q-learning With Imperfect Expert Demonstrations [56.55609745121237]
We propose a novel algorithm to speed up Q-learning with the help of a limited amount of imperfect expert demonstrations.
We evaluate our approach on a sparse-reward chain environment and six more complicated Atari games with delayed rewards.
arXiv Detail & Related papers (2022-10-01T17:38:19Z) - Online Attentive Kernel-Based Temporal Difference Learning [13.94346725929798]
Online Reinforcement Learning (RL) has been receiving increasing attention due to its fast learning capability and improving data efficiency.
Online RL often suffers from complex Value Function Approximation (VFA) and catastrophic interference.
We propose an Online Attentive Kernel-Based Temporal Difference (OAKTD) algorithm using two-timescale optimization.
arXiv Detail & Related papers (2022-01-22T14:47:10Z) - TRAIL: Near-Optimal Imitation Learning with Suboptimal Data [100.83688818427915]
We present training objectives that use offline datasets to learn a factored transition model.
Our theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning.
To learn the latent action space in practice, we propose TRAIL (Transition-Reparametrized Actions for Imitation Learning), an algorithm that learns an energy-based transition model.
arXiv Detail & Related papers (2021-10-27T21:05:00Z) - Boosting for Online Convex Optimization [64.15578413206715]
We consider the decision-making framework of online convex optimization with a large number of experts.
We define a weak learning algorithm as a mechanism that guarantees approximate regret against a base class of experts.
We give an efficient boosting algorithm that guarantees near-optimal regret against the convex hull of the base class.
arXiv Detail & Related papers (2021-02-18T12:30:49Z) - Reverse engineering learned optimizers reveals known and novel
mechanisms [50.50540910474342]
Learneds are algorithms that can themselves be trained to solve optimization problems.
Our results help elucidate the previously murky understanding of how learneds work, and establish tools for interpreting future learneds.
arXiv Detail & Related papers (2020-11-04T07:12:43Z) - Optimal Robustness-Consistency Trade-offs for Learning-Augmented Online
Algorithms [85.97516436641533]
We study the problem of improving the performance of online algorithms by incorporating machine-learned predictions.
The goal is to design algorithms that are both consistent and robust.
We provide the first set of non-trivial lower bounds for competitive analysis using machine-learned predictions.
arXiv Detail & Related papers (2020-10-22T04:51:01Z) - Online Convex Optimization Perspective for Learning from Dynamically
Revealed Preferences [0.0]
We study the problem of online learning from revealed preferences.
A learner wishes to learn a non-strategic agent's private utility function through observing the agent's utility-maximizing actions in a changing environment.
We adopt an online inverse optimization setup, where the learner observes a stream of agent's actions in an online fashion and the learning performance is measured by regret associated with a loss function.
arXiv Detail & Related papers (2020-08-24T14:05:13Z) - Meta-Gradient Reinforcement Learning with an Objective Discovered Online [54.15180335046361]
We propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network.
Because the objective is discovered online, it can adapt to changes over time.
On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency.
arXiv Detail & Related papers (2020-07-16T16:17:09Z) - A Modern Introduction to Online Learning [15.974402990630402]
Online learning refers to the framework of minimization of regret under worst-case assumptions.
I present first-order and second-order algorithms for online learning with convex losses.
arXiv Detail & Related papers (2019-12-31T08:16:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.