Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning
- URL: http://arxiv.org/abs/2111.12050v1
- Date: Tue, 23 Nov 2021 18:10:48 GMT
- Title: Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning
- Authors: Zhenhuan Yang, Yunwen Lei, Puyu Wang, Tianbao Yang and Yiming Ying
- Abstract summary: Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
- Score: 65.54757265434465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pairwise learning refers to learning tasks where the loss function depends on
a pair of instances. It instantiates many important machine learning tasks such
as bipartite ranking and metric learning. A popular approach to handle
streaming data in pairwise learning is an online gradient descent (OGD)
algorithm, where one needs to pair the current instance with a buffering set of
previous instances with a sufficiently large size and therefore suffers from a
scalability issue. In this paper, we propose simple stochastic and online
gradient descent methods for pairwise learning. A notable difference from the
existing studies is that we only pair the current instance with the previous
one in building a gradient direction, which is efficient in both the storage
and computational complexity. We develop novel stability results, optimization,
and generalization error bounds for both convex and nonconvex as well as both
smooth and nonsmooth problems. We introduce novel techniques to decouple the
dependency of models and the previous instance in both the optimization and
generalization analysis. Our study resolves an open question on developing
meaningful generalization bounds for OGD using a buffering set with a very
small fixed size. We also extend our algorithms and stability analysis to
develop differentially private SGD algorithms for pairwise learning which
significantly improves the existing results.
Related papers
- On Improving the Algorithm-, Model-, and Data- Efficiency of Self-Supervised Learning [18.318758111829386]
We propose an efficient single-branch SSL method based on non-parametric instance discrimination.
We also propose a novel self-distillation loss that minimizes the KL divergence between the probability distribution and its square root version.
arXiv Detail & Related papers (2024-04-30T06:39:04Z) - Limited Memory Online Gradient Descent for Kernelized Pairwise Learning
with Dynamic Averaging [18.843097436906618]
We introduce a lightweight OGD algorithm that does not require the independence of examples and generalizes to kernel pairwise learning.
Our algorithm builds the gradient based on a random example and a moving average representing the past data, which results in a sub-linear regret bound with a complexity of $O(T)$.
Several experiments with real-world datasets show that the complexity technique outperforms kernel and linear gradient in offline and online scenarios.
arXiv Detail & Related papers (2024-02-02T05:21:50Z) - Variance Reduced Online Gradient Descent for Kernelized Pairwise
Learning with Limited Memory [19.822215548822882]
Online gradient descent (OGD) algorithms have been proposed to handle online pairwise learning, where data arrives sequentially.
Recent advancements in OGD algorithms have aimed to reduce the complexity of calculating online gradients, achieving complexities less than $O(T)$ and even as low as $O(1)$.
In this study, we propose a limited memory OGD algorithm that extends to kernel online pairwise learning while improving the sublinear regret.
arXiv Detail & Related papers (2023-10-10T09:50:54Z) - Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive
Least-Squares [8.443742714362521]
We develop an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints.
Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA)
Our experiments show the effectiveness of the proposed method compared to the baselines.
arXiv Detail & Related papers (2022-07-28T02:01:31Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Learning to Sparsify Travelling Salesman Problem Instances [0.5985204759362747]
We use a pruning machine learning as a pre-processing step followed by an exact Programming approach to sparsify the travelling salesman problem.
Our learning approach requires very little training data and is amenable to mathematical analysis.
arXiv Detail & Related papers (2021-04-19T14:35:14Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Physarum Powered Differentiable Linear Programming Layers and
Applications [48.77235931652611]
We propose an efficient and differentiable solver for general linear programming problems.
We show the use of our solver in a video segmentation task and meta-learning for few-shot learning.
arXiv Detail & Related papers (2020-04-30T01:50:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.