Related papers: Multi-kernel Passive Stochastic Gradient Algorithms and Transfer Learning

Multi-kernel Passive Stochastic Gradient Algorithms and Transfer Learning

URL: http://arxiv.org/abs/2008.10020v2
Date: Mon, 8 Feb 2021 03:34:03 GMT
Title: Multi-kernel Passive Stochastic Gradient Algorithms and Transfer Learning
Authors: Vikram Krishnamurthy and George Yin
Abstract summary: The gradient algorithm does not have control over the location where noisy gradients of the cost function are evaluated. The algorithm performs substantially better in high dimensional problems and incorporates variance reduction.
Score: 21.796874356469644
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper develops a novel passive stochastic gradient algorithm. In passive stochastic approximation, the stochastic gradient algorithm does not have control over the location where noisy gradients of the cost function are evaluated. Classical passive stochastic gradient algorithms use a kernel that approximates a Dirac delta to weigh the gradients based on how far they are evaluated from the desired point. In this paper we construct a multi-kernel passive stochastic gradient algorithm. The algorithm performs substantially better in high dimensional problems and incorporates variance reduction. We analyze the weak convergence of the multi-kernel algorithm and its rate of convergence. In numerical examples, we study the multi-kernel version of the passive least mean squares (LMS) algorithm for transfer learning to compare the performance with the classical passive version.

Related papers

Limit Theorems for Stochastic Gradient Descent with Infinite Variance [51.4853131023238]
We show that the gradient descent algorithm can be characterized as the stationary distribution of a suitably defined Ornstein-rnstein process driven by an appropriate L'evy process.<n>We also explore the applications of these results in linear regression and logistic regression models.
arXiv Detail & Related papers (2024-10-21T09:39:10Z)
An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness [15.656614304616006]
This paper investigates a class of bilevel optimization problems where the upper-level function is non- unbounded smoothness and the lower-level problem is strongly convex. These problems have significant applications in data learning, such as text classification using neural networks.
arXiv Detail & Related papers (2024-09-28T02:30:44Z)
Variance reduction techniques for stochastic proximal point algorithms [5.374800961359305]
In this work, we propose the first unified study of variance reduction techniques for proximal point algorithms. We introduce a generic proximal-based algorithm that can be specified to give the proximal version of SVRG, SAGA, and some of their variants. Our experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts.
arXiv Detail & Related papers (2023-08-18T05:11:50Z)
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems. In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z)
Random-reshuffled SARAH does not need a full gradient computations [61.85897464405715]
The StochAstic Recursive grAdientritHm (SARAH) algorithm is a variance reduced variant of the Gradient Descent (SGD) algorithm. In this paper, we remove the necessity of a full gradient. The aggregated gradients serve as an estimate of a full gradient in the SARAH algorithm.
arXiv Detail & Related papers (2021-11-26T06:00:44Z)
Differentiable Annealed Importance Sampling and the Perils of Gradient Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation. Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective. We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z)
Reparametrizing gradient descent [0.0]
We propose an optimization algorithm which we call norm-adapted gradient descent. Our algorithm can also be compared to quasi-Newton methods, but we seek roots rather than stationary points.
arXiv Detail & Related papers (2020-10-09T20:22:29Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
Variance Reduction with Sparse Gradients [82.41780420431205]
Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients. We introduce a new sparsity operator: The random-top-k operator. Our algorithm consistently outperforms SpiderBoost on various tasks including image classification, natural language processing, and sparse matrix factorization.
arXiv Detail & Related papers (2020-01-27T08:23:58Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.