Fine-grained Generalization Analysis of Vector-valued Learning
- URL: http://arxiv.org/abs/2104.14173v1
- Date: Thu, 29 Apr 2021 07:57:34 GMT
- Title: Fine-grained Generalization Analysis of Vector-valued Learning
- Authors: Liang Wu, Antoine Ledent, Yunwen Lei, Marius Kloft
- Abstract summary: We start the generalization analysis of regularized vector-valued learning algorithms by presenting bounds with a mild dependency on the output dimension and a fast rate on the sample size.
To understand the interaction between optimization and learning, we further use our results to derive the first bounds for descent with vector-valued functions.
As a byproduct, we derive a Rademacher complexity bound for loss function classes defined in terms of a general strongly convex function.
- Score: 28.722350261462463
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Many fundamental machine learning tasks can be formulated as a problem of
learning with vector-valued functions, where we learn multiple scalar-valued
functions together. Although there is some generalization analysis on different
specific algorithms under the empirical risk minimization principle, a unifying
analysis of vector-valued learning under a regularization framework is still
lacking. In this paper, we initiate the generalization analysis of regularized
vector-valued learning algorithms by presenting bounds with a mild dependency
on the output dimension and a fast rate on the sample size. Our discussions
relax the existing assumptions on the restrictive constraint of hypothesis
spaces, smoothness of loss functions and low-noise condition. To understand the
interaction between optimization and learning, we further use our results to
derive the first generalization bounds for stochastic gradient descent with
vector-valued functions. We apply our general results to multi-class
classification and multi-label classification, which yield the first bounds
with a logarithmic dependency on the output dimension for extreme multi-label
classification with the Frobenius regularization. As a byproduct, we derive a
Rademacher complexity bound for loss function classes defined in terms of a
general strongly convex function.
Related papers
- Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms [28.046728466038022]
We study theoretical properties of a broad class of regularized algorithms with vector-valued output.
We rigorously confirm the so-called saturation effect for ridge regression with vector-valued output.
We present the upper bound for the finite sample risk general vector-valued spectral algorithms.
arXiv Detail & Related papers (2024-05-23T16:45:52Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Fine-grained analysis of non-parametric estimation for pairwise learning [9.676007573960383]
We are concerned with the generalization performance of non-parametric estimation for pairwise learning.
Our results can be used to handle a wide range of pairwise learning problems including ranking, AUC, pairwise regression and metric and similarity learning.
arXiv Detail & Related papers (2023-05-31T08:13:14Z) - Generalization Analysis for Contrastive Representation Learning [80.89690821916653]
Existing generalization error bounds depend linearly on the number $k$ of negative examples.
We establish novel generalization bounds for contrastive learning which do not depend on $k$, up to logarithmic terms.
arXiv Detail & Related papers (2023-02-24T01:03:56Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Information-Theoretic Generalization Bounds for Iterative
Semi-Supervised Learning [81.1071978288003]
In particular, we seek to understand the behaviour of the em generalization error of iterative SSL algorithms using information-theoretic principles.
Our theoretical results suggest that when the class conditional variances are not too large, the upper bound on the generalization error decreases monotonically with the number of iterations, but quickly saturates.
arXiv Detail & Related papers (2021-10-03T05:38:49Z) - Convergence bounds for nonlinear least squares and applications to
tensor recovery [0.0]
We consider the problem of approximating a function in general nonlinear subsets of $L2$ when only a weighted Monte Carlo estimate of the $L2$-norm can be computed.
A critical analysis of our results allows us to derive a sample efficient algorithm for the model set of low-rank tensors.
arXiv Detail & Related papers (2021-08-11T14:14:02Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - Minimax Estimation of Linear Functions of Eigenvectors in the Face of
Small Eigen-Gaps [95.62172085878132]
Eigenvector perturbation analysis plays a vital role in various statistical data science applications.
We develop a suite of statistical theory that characterizes the perturbation of arbitrary linear functions of an unknown eigenvector.
In order to mitigate a non-negligible bias issue inherent to the natural "plug-in" estimator, we develop de-biased estimators.
arXiv Detail & Related papers (2021-04-07T17:55:10Z) - Optimistic bounds for multi-output prediction [6.015556590955814]
We investigate the challenge of multi-output learning, where the goal is to learn a vector-valued function based on a supervised data set.
This includes a range of important problems in Machine Learning including multi-target regression, multi-class classification and multi-label classification.
We show that the self-bounding Lipschitz condition gives rise to optimistic bounds for multi-output learning, which are minimax optimal up to logarithmic factors.
arXiv Detail & Related papers (2020-02-22T20:54:17Z) - Semi-supervised Vector-valued Learning: Improved Bounds and Algorithms [20.53130700587322]
We derive novel semi-supervised excess risk bounds for general vector-valued learning from both kernel perspective and linear perspective.
Motivated by our theoretical analysis, we propose a general semi-supervised algorithm for efficiently learning vector-valued functions.
arXiv Detail & Related papers (2019-09-11T07:30:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.