PDE-Based Optimal Strategy for Unconstrained Online Learning
- URL: http://arxiv.org/abs/2201.07877v1
- Date: Wed, 19 Jan 2022 22:21:21 GMT
- Title: PDE-Based Optimal Strategy for Unconstrained Online Learning
- Authors: Zhiyu Zhang, Ashok Cutkosky, Ioannis Paschalidis
- Abstract summary: We present a framework that generates time-varying potential functions by solving a Partial Differential Equation (PDE)
Our framework recovers some classical potentials, and more importantly provides a systematic approach to design new ones.
This is the first parameter-free algorithm with optimal leading constant.
- Score: 40.61498562988079
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unconstrained Online Linear Optimization (OLO) is a practical problem setting
to study the training of machine learning models. Existing works proposed a
number of potential-based algorithms, but in general the design of such
potential functions is ad hoc and heavily relies on guessing. In this paper, we
present a framework that generates time-varying potential functions by solving
a Partial Differential Equation (PDE). Our framework recovers some classical
potentials, and more importantly provides a systematic approach to design new
ones.
The power of our framework is demonstrated through a concrete example. When
losses are 1-Lipschitz, we design a novel OLO algorithm with anytime regret
upper bound $C\sqrt{T}+||u||\sqrt{2T}[\sqrt{\log(1+||u||/C)}+2]$, where $C$ is
a user-specified constant and $u$ is any comparator whose norm is unknown and
unbounded a priori. By constructing a matching lower bound, we further show
that the leading order term, including the constant multiplier $\sqrt{2}$, is
tight. To our knowledge, this is the first parameter-free algorithm with
optimal leading constant. The obtained theoretical benefits are validated by
experiments.
Related papers
- Online non-parametric likelihood-ratio estimation by Pearson-divergence
functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time.
We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z) - Learning to Relax: Setting Solver Parameters Across a Sequence of Linear System Instances [42.16343861129583]
We show that a bandit online learning algorithm can select parameters for a sequence of instances such that the overall cost approaches that of the best fixed $omega$ as the sequence length increases.
Our work provides the first learning-theoretic treatment of high-precision linear system solvers and the first end-to-end guarantees for data-driven scientific computing.
arXiv Detail & Related papers (2023-10-03T17:51:42Z) - Target-based Surrogates for Stochastic Optimization [26.35752393302125]
We consider minimizing functions for which it is expensive to compute the (possibly) gradient.
Such functions are prevalent in computation reinforcement learning, imitation learning and adversarial training.
Our framework allows the use of standard optimization algorithms to construct surrogates which can be minimized efficiently.
arXiv Detail & Related papers (2023-02-06T08:08:34Z) - Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems.
We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z) - Human-in-the-loop: Provably Efficient Preference-based Reinforcement
Learning with General Function Approximation [107.54516740713969]
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences.
Instead of receiving a numeric reward at each step, the agent only receives preferences over trajectory pairs from a human overseer.
We propose the first optimistic model-based algorithm for PbRL with general function approximation.
arXiv Detail & Related papers (2022-05-23T09:03:24Z) - Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits.
We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.