Regularization and Optimal Multiclass Learning
- URL: http://arxiv.org/abs/2309.13692v2
- Date: Tue, 25 Jun 2024 17:42:18 GMT
- Title: Regularization and Optimal Multiclass Learning
- Authors: Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng,
- Abstract summary: This work is to characterize the role of regularization in perhaps the simplest setting for which empirical risk minimization fails: multiclass learning with arbitrary label sets.
Using one-inclusion graphs (OIGs), we exhibit optimal learning algorithms that dovetail with tried-and-true algorithmic principles.
- Score: 10.168670899305232
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The quintessential learning algorithm of empirical risk minimization (ERM) is known to fail in various settings for which uniform convergence does not characterize learning. It is therefore unsurprising that the practice of machine learning is rife with considerably richer algorithmic techniques for successfully controlling model capacity. Nevertheless, no such technique or principle has broken away from the pack to characterize optimal learning in these more general settings. The purpose of this work is to characterize the role of regularization in perhaps the simplest setting for which ERM fails: multiclass learning with arbitrary label sets. Using one-inclusion graphs (OIGs), we exhibit optimal learning algorithms that dovetail with tried-and-true algorithmic principles: Occam's Razor as embodied by structural risk minimization (SRM), the principle of maximum entropy, and Bayesian reasoning. Most notably, we introduce an optimal learner which relaxes structural risk minimization on two dimensions: it allows the regularization function to be "local" to datapoints, and uses an unsupervised learning stage to learn this regularizer at the outset. We justify these relaxations by showing that they are necessary: removing either dimension fails to yield a near-optimal learner. We also extract from OIGs a combinatorial sequence we term the Hall complexity, which is the first to characterize a problem's transductive error rate exactly. Lastly, we introduce a generalization of OIGs and the transductive learning setting to the agnostic case, where we show that optimal orientations of Hamming graphs -- judged using nodes' outdegrees minus a system of node-dependent credits -- characterize optimal learners exactly. We demonstrate that an agnostic version of the Hall complexity again characterizes error rates exactly, and exhibit an optimal learner using maximum entropy programs.
Related papers
- Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning.
One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems.
We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z) - A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - GBM-based Bregman Proximal Algorithms for Constrained Learning [3.667453772837954]
We adapt GBM for constrained learning tasks within the framework of Bregman proximal algorithms.
We introduce a new Bregman method with a global optimality guarantee when the learning objective functions are convex.
We provide substantial experimental evidence to showcase the effectiveness of the Bregman algorithm framework.
arXiv Detail & Related papers (2023-08-21T14:56:51Z) - Active Learning in the Predict-then-Optimize Framework: A Margin-Based
Approach [5.371816551086118]
We develop a learning method that sequentially decides whether to request the "labels" of feature samples from an unlabeled data stream.
Our active learning method is the first to be directly informed by the decision error induced by the predicted parameters.
arXiv Detail & Related papers (2023-05-11T05:44:36Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - Smoothed Online Learning is as Easy as Statistical Learning [77.00766067963195]
We provide the first oracle-efficient, no-regret algorithms in this setting.
We show that if a function class is learnable in the classical setting, then there is an oracle-efficient, no-regret algorithm for contextual bandits.
arXiv Detail & Related papers (2022-02-09T19:22:34Z) - Neural Active Learning with Performance Guarantees [37.16062387461106]
We investigate the problem of active learning in the streaming setting in non-parametric regimes, where the labels are generated from a class of functions on which we make no assumptions whatsoever.
We rely on recently proposed Neural Tangent Kernel (NTK) approximation tools to construct a suitable neural embedding that determines the feature space the algorithm operates on and the learned model computed atop.
arXiv Detail & Related papers (2021-06-06T20:44:23Z) - Smooth Bilevel Programming for Sparse Regularization [5.177947445379688]
Iteratively reweighted least square (IRLS) is a popular approach to solve sparsity-enforcing regression problems in machine learning.
We show how a surprisingly reparametrization of IRLS, coupled with a bilevel scheme, achieves topranging of sparsity.
arXiv Detail & Related papers (2021-06-02T19:18:22Z) - Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning [36.015585972493575]
This paper considers batch Reinforcement Learning (RL) with general value function approximation.
The excess risk of Empirical Risk Minimizer (ERM) is bounded by the Rademacher complexity of the function class.
Fast statistical rates can be achieved by using tools of local Rademacher complexity.
arXiv Detail & Related papers (2021-03-25T14:45:29Z) - Agnostic Proper Learning of Halfspaces under Gaussian Marginals [56.01192577666607]
We study the problem of agnostically learning halfspaces under the Gaussian.
Our main result is the em first proper learning algorithm for this problem.
arXiv Detail & Related papers (2021-02-10T18:40:44Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.