Related papers: PDE-Based Optimal Strategy for Unconstrained Online Learning

PDE-Based Optimal Strategy for Unconstrained Online Learning

URL: http://arxiv.org/abs/2201.07877v1
Date: Wed, 19 Jan 2022 22:21:21 GMT
Title: PDE-Based Optimal Strategy for Unconstrained Online Learning
Authors: Zhiyu Zhang, Ashok Cutkosky, Ioannis Paschalidis
Abstract summary: We present a framework that generates time-varying potential functions by solving a Partial Differential Equation (PDE) Our framework recovers some classical potentials, and more importantly provides a systematic approach to design new ones. This is the first parameter-free algorithm with optimal leading constant.
Score: 40.61498562988079
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Unconstrained Online Linear Optimization (OLO) is a practical problem setting to study the training of machine learning models. Existing works proposed a number of potential-based algorithms, but in general the design of such potential functions is ad hoc and heavily relies on guessing. In this paper, we present a framework that generates time-varying potential functions by solving a Partial Differential Equation (PDE). Our framework recovers some classical potentials, and more importantly provides a systematic approach to design new ones. The power of our framework is demonstrated through a concrete example. When losses are 1-Lipschitz, we design a novel OLO algorithm with anytime regret upper bound $C\sqrt{T}+||u||\sqrt{2T}[\sqrt{\log(1+||u||/C)}+2]$, where $C$ is a user-specified constant and $u$ is any comparator whose norm is unknown and unbounded a priori. By constructing a matching lower bound, we further show that the leading order term, including the constant multiplier $\sqrt{2}$, is tight. To our knowledge, this is the first parameter-free algorithm with optimal leading constant. The obtained theoretical benefits are validated by experiments.

Related papers

Computational Intractability of Strategizing against Online Learners [7.225039292056237]
We show that a standard no-regret algorithm can compute a near-optimal strategy against a learner. This establishes a fundamental computational barrier to finding optimal strategies in general game-theoretic settings.
arXiv Detail & Related papers (2025-03-06T08:28:50Z)
Projection-free Algorithms for Online Convex Optimization with Adversarial Constraints [10.047668792033033]
We study a generalization of the Online Convex Optimization (OCO) framework with time-varying adversarial constraints. In this problem, after selecting a feasible action from the convex decision set $X,$ a convex constraint function is revealed alongside the cost function in each round. We propose a *projection-free* online policy which makes a single call to a Linear Program (LP) solver per round.
arXiv Detail & Related papers (2025-01-28T13:04:32Z)
Online Learning for Function Placement in Serverless Computing [6.810196737272974]
We study the placement of virtual functions aimed at minimizing the cost.<n>We propose a novel algorithm, using ideas based on multi-armed bandits.<n>We show through numerical experiments that the proposed algorithm both has good practical performance and modest computational complexity.
arXiv Detail & Related papers (2024-10-17T16:03:43Z)
Learning to Cover: Online Learning and Optimization with Irreversible Decisions [50.5775508521174]
We define an online learning and optimization problem with discrete and irreversible decisions contributing toward a coverage target.<n>In each period, a decision-maker selects facilities to open, receives information on the success of each one, and updates a classification model to guide future decisions.<n>The goal is to minimize facility openings under a chance constraint reflecting the coverage target, in an horizon regime characterized by a large target number of facilities.
arXiv Detail & Related papers (2024-06-20T23:00:25Z)
Online non-parametric likelihood-ratio estimation by Pearson-divergence functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time. We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z)
Learning to Relax: Setting Solver Parameters Across a Sequence of Linear System Instances [42.16343861129583]
We show that a bandit online learning algorithm can select parameters for a sequence of instances such that the overall cost approaches that of the best fixed $omega$ as the sequence length increases. Our work provides the first learning-theoretic treatment of high-precision linear system solvers and the first end-to-end guarantees for data-driven scientific computing.
arXiv Detail & Related papers (2023-10-03T17:51:42Z)
Target-based Surrogates for Stochastic Optimization [26.35752393302125]
We consider minimizing functions for which it is expensive to compute the (possibly) gradient. Such functions are prevalent in computation reinforcement learning, imitation learning and adversarial training. Our framework allows the use of standard optimization algorithms to construct surrogates which can be minimized efficiently.
arXiv Detail & Related papers (2023-02-06T08:08:34Z)
Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems. We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z)
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation [107.54516740713969]
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences. Instead of receiving a numeric reward at each step, the agent only receives preferences over trajectory pairs from a human overseer. We propose the first optimistic model-based algorithm for PbRL with general function approximation.
arXiv Detail & Related papers (2022-05-23T09:03:24Z)
Adapting to Misspecification in Contextual Bandits [82.55565343668246]
We introduce a new family of oracle-efficient algorithms for $varepsilon$-misspecified contextual bandits. We obtain the first algorithm that achieves the optimal $O(dsqrtT + varepsilonsqrtdT)$ regret bound for unknown misspecification level.
arXiv Detail & Related papers (2021-07-12T21:30:41Z)
Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary. Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.