Variance Reduced Online Gradient Descent for Kernelized Pairwise
Learning with Limited Memory
- URL: http://arxiv.org/abs/2310.06483v1
- Date: Tue, 10 Oct 2023 09:50:54 GMT
- Title: Variance Reduced Online Gradient Descent for Kernelized Pairwise
Learning with Limited Memory
- Authors: Hilal AlQuabeh, Bhaskar Mukhoty, Bin Gu
- Abstract summary: Online gradient descent (OGD) algorithms have been proposed to handle online pairwise learning, where data arrives sequentially.
Recent advancements in OGD algorithms have aimed to reduce the complexity of calculating online gradients, achieving complexities less than $O(T)$ and even as low as $O(1)$.
In this study, we propose a limited memory OGD algorithm that extends to kernel online pairwise learning while improving the sublinear regret.
- Score: 19.822215548822882
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pairwise learning is essential in machine learning, especially for problems
involving loss functions defined on pairs of training examples. Online gradient
descent (OGD) algorithms have been proposed to handle online pairwise learning,
where data arrives sequentially. However, the pairwise nature of the problem
makes scalability challenging, as the gradient computation for a new sample
involves all past samples. Recent advancements in OGD algorithms have aimed to
reduce the complexity of calculating online gradients, achieving complexities
less than $O(T)$ and even as low as $O(1)$. However, these approaches are
primarily limited to linear models and have induced variance. In this study, we
propose a limited memory OGD algorithm that extends to kernel online pairwise
learning while improving the sublinear regret. Specifically, we establish a
clear connection between the variance of online gradients and the regret, and
construct online gradients using the most recent stratified samples with a
limited buffer of size of $s$ representing all past data, which have a
complexity of $O(sT)$ and employs $O(\sqrt{T}\log{T})$ random Fourier features
for kernel approximation. Importantly, our theoretical results demonstrate that
the variance-reduced online gradients lead to an improved sublinear regret
bound. The experiments on real-world datasets demonstrate the superiority of
our algorithm over both kernelized and linear online pairwise learning
algorithms.
Related papers
- Limited Memory Online Gradient Descent for Kernelized Pairwise Learning
with Dynamic Averaging [18.843097436906618]
We introduce a lightweight OGD algorithm that does not require the independence of examples and generalizes to kernel pairwise learning.
Our algorithm builds the gradient based on a random example and a moving average representing the past data, which results in a sub-linear regret bound with a complexity of $O(T)$.
Several experiments with real-world datasets show that the complexity technique outperforms kernel and linear gradient in offline and online scenarios.
arXiv Detail & Related papers (2024-02-02T05:21:50Z) - Efficient Methods for Non-stationary Online Learning [67.3300478545554]
We present efficient methods for optimizing dynamic regret and adaptive regret, which reduce the number of projections per round from $mathcalO(log T)$ to $1$.
Our technique hinges on the reduction mechanism developed in parameter-free online learning and requires non-trivial twists on non-stationary online methods.
arXiv Detail & Related papers (2023-09-16T07:30:12Z) - Ordering for Non-Replacement SGD [7.11967773739707]
We seek to find an ordering that can improve the convergence rates for the non-replacement form of the algorithm.
We develop optimal orderings for constant and decreasing step sizes for strongly convex and convex functions.
In addition, we are able to combine the ordering with mini-batch and further apply it to more complex neural networks.
arXiv Detail & Related papers (2023-06-28T00:46:58Z) - Pairwise Learning via Stagewise Training in Proximal Setting [0.0]
We combine adaptive sample size and importance sampling techniques for pairwise learning, with convergence guarantees for nonsmooth convex pairwise loss functions.
We demonstrate that sampling opposite instances at each reduces the variance of the gradient, hence accelerating convergence.
arXiv Detail & Related papers (2022-08-08T11:51:01Z) - One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive
Least-Squares [8.443742714362521]
We develop an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints.
Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA)
Our experiments show the effectiveness of the proposed method compared to the baselines.
arXiv Detail & Related papers (2022-07-28T02:01:31Z) - Implicit Parameter-free Online Learning with Truncated Linear Models [51.71216912089413]
parameter-free algorithms are online learning algorithms that do not require setting learning rates.
We propose new parameter-free algorithms that can take advantage of truncated linear models through a new update that has an "implicit" flavor.
Based on a novel decomposition of the regret, the new update is efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameter-free properties.
arXiv Detail & Related papers (2022-03-19T13:39:49Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.