Optimistic bounds for multi-output prediction
- URL: http://arxiv.org/abs/2002.09769v1
- Date: Sat, 22 Feb 2020 20:54:17 GMT
- Title: Optimistic bounds for multi-output prediction
- Authors: Henry WJ Reeve, Ata Kaban
- Abstract summary: We investigate the challenge of multi-output learning, where the goal is to learn a vector-valued function based on a supervised data set.
This includes a range of important problems in Machine Learning including multi-target regression, multi-class classification and multi-label classification.
We show that the self-bounding Lipschitz condition gives rise to optimistic bounds for multi-output learning, which are minimax optimal up to logarithmic factors.
- Score: 6.015556590955814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the challenge of multi-output learning, where the goal is to
learn a vector-valued function based on a supervised data set. This includes a
range of important problems in Machine Learning including multi-target
regression, multi-class classification and multi-label classification. We begin
our analysis by introducing the self-bounding Lipschitz condition for
multi-output loss functions, which interpolates continuously between a
classical Lipschitz condition and a multi-dimensional analogue of a smoothness
condition. We then show that the self-bounding Lipschitz condition gives rise
to optimistic bounds for multi-output learning, which are minimax optimal up to
logarithmic factors. The proof exploits local Rademacher complexity combined
with a powerful minoration inequality due to Srebro, Sridharan and Tewari. As
an application we derive a state-of-the-art generalization bound for
multi-class gradient boosting.
Related papers
- Optimization on a Finer Scale: Bounded Local Subgradient Variation Perspective [17.5796206457693]
We study nonsmooth optimization problems under bounded local subgradient variation.
The resulting class of objective encapsulates the classes of objective based on the defined classes.
arXiv Detail & Related papers (2024-03-24T22:42:40Z) - Extension of Transformational Machine Learning: Classification Problems [0.0]
This study explores the application and performance of Transformational Machine Learning (TML) in drug discovery.
TML, a meta learning algorithm, excels in exploiting common attributes across various domains.
The drug discovery process, which is complex and time-consuming, can benefit greatly from the enhanced prediction accuracy.
arXiv Detail & Related papers (2023-08-07T07:34:18Z) - Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram
Iteration [122.51142131506639]
We introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory.
We show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability.
It proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches.
arXiv Detail & Related papers (2023-05-25T15:32:21Z) - Generalization Analysis for Contrastive Representation Learning [80.89690821916653]
Existing generalization error bounds depend linearly on the number $k$ of negative examples.
We establish novel generalization bounds for contrastive learning which do not depend on $k$, up to logarithmic terms.
arXiv Detail & Related papers (2023-02-24T01:03:56Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Efficient and Near-Optimal Smoothed Online Learning for Generalized
Linear Functions [28.30744223973527]
We give a computationally efficient algorithm that is the first to enjoy the statistically optimal log(T/sigma) regret for realizable K-wise linear classification.
We develop a novel characterization of the geometry of the disagreement region induced by generalized linear classifiers.
arXiv Detail & Related papers (2022-05-25T21:31:36Z) - Near-optimal Offline Reinforcement Learning with Linear Representation:
Leveraging Variance Information with Pessimism [65.46524775457928]
offline reinforcement learning seeks to utilize offline/historical data to optimize sequential decision-making strategies.
We study the statistical limits of offline reinforcement learning with linear model representations.
arXiv Detail & Related papers (2022-03-11T09:00:12Z) - Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions.
Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure.
In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z) - Fine-grained Generalization Analysis of Vector-valued Learning [28.722350261462463]
We start the generalization analysis of regularized vector-valued learning algorithms by presenting bounds with a mild dependency on the output dimension and a fast rate on the sample size.
To understand the interaction between optimization and learning, we further use our results to derive the first bounds for descent with vector-valued functions.
As a byproduct, we derive a Rademacher complexity bound for loss function classes defined in terms of a general strongly convex function.
arXiv Detail & Related papers (2021-04-29T07:57:34Z) - The Optimality of Polynomial Regression for Agnostic Learning under
Gaussian Marginals [47.81107898315438]
We develop a method for finding hard families of examples for a wide class of problems by using duality LP.
We show that the $L1$-regression is essentially best possible, and therefore that the computational difficulty of learning a concept class is closely related to the degree required to approximate any function from the class in $L1$-norm.
arXiv Detail & Related papers (2021-02-08T18:06:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.