Related papers: Efficient and Near-Optimal Smoothed Online Learning for Generalized Linear Functions

Efficient and Near-Optimal Smoothed Online Learning for Generalized Linear Functions

URL: http://arxiv.org/abs/2205.13056v1
Date: Wed, 25 May 2022 21:31:36 GMT
Title: Efficient and Near-Optimal Smoothed Online Learning for Generalized Linear Functions
Authors: Adam Block and Max Simchowitz
Abstract summary: We give a computationally efficient algorithm that is the first to enjoy the statistically optimal log(T/sigma) regret for realizable K-wise linear classification. We develop a novel characterization of the geometry of the disagreement region induced by generalized linear classifiers.
Score: 28.30744223973527
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Due to the drastic gap in complexity between sequential and batch statistical learning, recent work has studied a smoothed sequential learning setting, where Nature is constrained to select contexts with density bounded by 1/{\sigma} with respect to a known measure {\mu}. Unfortunately, for some function classes, there is an exponential gap between the statistically optimal regret and that which can be achieved efficiently. In this paper, we give a computationally efficient algorithm that is the first to enjoy the statistically optimal log(T/{\sigma}) regret for realizable K-wise linear classification. We extend our results to settings where the true classifier is linear in an over-parameterized polynomial featurization of the contexts, as well as to a realizable piecewise-regression setting assuming access to an appropriate ERM oracle. Somewhat surprisingly, standard disagreement-based analyses are insufficient to achieve regret logarithmic in 1/{\sigma}. Instead, we develop a novel characterization of the geometry of the disagreement region induced by generalized linear classifiers. Along the way, we develop numerous technical tools of independent interest, including a general anti-concentration bound for the determinant of certain matrix averages.

Related papers

Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
Improving Adaptive Online Learning Using Refined Discretization [44.646191058243645]
We study unconstrained Online Linear Optimization with Lipschitz losses. Motivated by the pursuit of instance optimality, we propose a new algorithm. Central to these results is a continuous time approach to online learning.
arXiv Detail & Related papers (2023-09-27T21:54:52Z)
Nonlinear Feature Aggregation: Two Algorithms driven by Theory [45.3190496371625]
Real-world machine learning applications are characterized by a huge number of features, leading to computational and memory issues. We propose a dimensionality reduction algorithm (NonLinCFA) which aggregates non-linear transformations of features with a generic aggregation function. We also test the algorithms on synthetic and real-world datasets, performing regression and classification tasks, showing competitive performances.
arXiv Detail & Related papers (2023-06-19T19:57:33Z)
Scalable Bayesian Meta-Learning through Generalized Implicit Gradients [64.21628447579772]
Implicit Bayesian meta-learning (iBaML) method broadens the scope of learnable priors, but also quantifies the associated uncertainty. Analytical error bounds are established to demonstrate the precision and efficiency of the generalized implicit gradient over the explicit one.
arXiv Detail & Related papers (2023-03-31T02:10:30Z)
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data. For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z)
Matrix Completion via Non-Convex Relaxation and Adaptive Correlation Learning [90.8576971748142]
We develop a novel surrogate that can be optimized by closed-form solutions. We exploit upperwise correlation for completion, and thus an adaptive correlation learning model.
arXiv Detail & Related papers (2022-03-04T08:50:50Z)
Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting. We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z)
Near-Linear Time Algorithm with Near-Logarithmic Regret Per Switch for Mixable/Exp-Concave Losses [0.0]
We study the online optimization of mixable loss functions with logarithmic static regret in a dynamic environment. For the first time in literature, we show that it is also possible to achieve near-logarithmic regret per switch with sub-polynomial complexity per time.
arXiv Detail & Related papers (2021-09-28T15:06:31Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Expectation propagation on the diluted Bayesian classifier [0.0]
We introduce a statistical mechanics inspired strategy that addresses the problem of sparse feature selection in the context of binary classification. A computational scheme known as expectation propagation (EP) is used to train a continuous-weights perceptron learning a classification rule. EP is a robust and competitive algorithm in terms of variable selection properties, estimation accuracy and computational complexity.
arXiv Detail & Related papers (2020-09-20T23:59:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.