Learning Rates for Nonconvex Pairwise Learning
- URL: http://arxiv.org/abs/2111.05232v1
- Date: Tue, 9 Nov 2021 16:12:20 GMT
- Title: Learning Rates for Nonconvex Pairwise Learning
- Authors: Shaojie Li, Yong Liu
- Abstract summary: Pairwise learning is receiving increasing attention since it improve many important learning tasks based on the size of the population.
Motivated nonwise learning provides improved learning rates.
- Score: 7.33244617309908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pairwise learning is receiving increasing attention since it covers many
important machine learning tasks, e.g., metric learning, AUC maximization, and
ranking. Investigating the generalization behavior of pairwise learning is thus
of significance. However, existing generalization analysis mainly focuses on
the convex objective functions, leaving the nonconvex learning far less
explored. Moreover, the current learning rates derived for generalization
performance of pairwise learning are mostly of slower order. Motivated by these
problems, we study the generalization performance of nonconvex pairwise
learning and provide improved learning rates. Specifically, we develop
different uniform convergence of gradients for pairwise learning under
different assumptions, based on which we analyze empirical risk minimizer,
gradient descent, and stochastic gradient descent pairwise learning. We first
successfully establish learning rates for these algorithms in a general
nonconvex setting, where the analysis sheds insights on the trade-off between
optimization and generalization and the role of early-stopping. We then
investigate the generalization performance of nonconvex learning with a
gradient dominance curvature condition. In this setting, we derive faster
learning rates of order $\mathcal{O}(1/n)$, where $n$ is the sample size.
Provided that the optimal population risk is small, we further improve the
learning rates to $\mathcal{O}(1/n^2)$, which, to the best of our knowledge,
are the first $\mathcal{O}(1/n^2)$-type of rates for pairwise learning, no
matter of convex or nonconvex learning. Overall, we systematically analyzed the
generalization performance of nonconvex pairwise learning.
Related papers
- Generalization Analysis for Contrastive Representation Learning [80.89690821916653]
Existing generalization error bounds depend linearly on the number $k$ of negative examples.
We establish novel generalization bounds for contrastive learning which do not depend on $k$, up to logarithmic terms.
arXiv Detail & Related papers (2023-02-24T01:03:56Z) - On the Stability and Generalization of Triplet Learning [55.75784102837832]
Triplet learning, i.e. learning from triplet data, has attracted much attention in computer vision tasks.
This paper investigates the generalization guarantees of triplet learning by leveraging the stability analysis.
arXiv Detail & Related papers (2023-02-20T07:32:50Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Improved Learning Rates for Stochastic Optimization: Two Theoretical
Viewpoints [7.33244617309908]
Generalization performance of optimization stands a central place in machine learning.
In this paper, we investigate the excess towards improved learning rates for two popular approaches of optimization.
Motivated by these problems, we aim to provide improved rates under milder assumptions in convex learning and derive faster rates non learning.
arXiv Detail & Related papers (2021-07-19T08:46:14Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Differential Privacy for Pairwise Learning: Non-convex Analysis [16.815470316398326]
Pairwise learning focuses on learning relationships with pairwise loss functions.
We analyze the privacy of pairwise learning and propose a new differential privacy paradigm for perturbations.
arXiv Detail & Related papers (2021-05-07T02:20:23Z) - On Learning Rates and Schr\"odinger Operators [105.32118775014015]
We present a general theoretical analysis of the effect of the learning rate.
We find that the learning rate tends to zero for a broad non- neural class functions.
arXiv Detail & Related papers (2020-04-15T09:52:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.