Online Learning Under A Separable Stochastic Approximation Framework
- URL: http://arxiv.org/abs/2305.07484v2
- Date: Sat, 20 May 2023 18:03:51 GMT
- Title: Online Learning Under A Separable Stochastic Approximation Framework
- Authors: Min Gan, Xiang-xiang Su, Guang-yong Chen, Jing Chen
- Abstract summary: We propose an online learning algorithm for a class of machine learning models under a separable approximation framework.
We show that the proposed algorithm produces more robust and test performance when compared to other popular learning algorithms.
- Score: 20.26530917721778
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an online learning algorithm for a class of machine learning
models under a separable stochastic approximation framework. The essence of our
idea lies in the observation that certain parameters in the models are easier
to optimize than others. In this paper, we focus on models where some
parameters have a linear nature, which is common in machine learning. In one
routine of the proposed algorithm, the linear parameters are updated by the
recursive least squares (RLS) algorithm, which is equivalent to a stochastic
Newton method; then, based on the updated linear parameters, the nonlinear
parameters are updated by the stochastic gradient method (SGD). The proposed
algorithm can be understood as a stochastic approximation version of block
coordinate gradient descent approach in which one part of the parameters is
updated by a second-order SGD method while the other part is updated by a
first-order SGD. Global convergence of the proposed online algorithm for
non-convex cases is established in terms of the expected violation of a
first-order optimality condition. Numerical experiments show that the proposed
method accelerates convergence significantly and produces more robust training
and test performance when compared to other popular learning algorithms.
Moreover, our algorithm is less sensitive to the learning rate and outperforms
the recently proposed slimTrain algorithm (Newman et al., 2022). The code has
been uploaded to GitHub for validation.
Related papers
- Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z) - Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both
Worlds in Stochastic and Deterministic Environments [48.96971760679639]
We study variance-dependent regret bounds for Markov decision processes (MDPs)
We propose two new environment norms to characterize the fine-grained variance properties of the environment.
For model-based methods, we design a variant of the MVP algorithm.
In particular, this bound is simultaneously minimax optimal for both and deterministic MDPs.
arXiv Detail & Related papers (2023-01-31T06:54:06Z) - Implicit Parameter-free Online Learning with Truncated Linear Models [51.71216912089413]
parameter-free algorithms are online learning algorithms that do not require setting learning rates.
We propose new parameter-free algorithms that can take advantage of truncated linear models through a new update that has an "implicit" flavor.
Based on a novel decomposition of the regret, the new update is efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameter-free properties.
arXiv Detail & Related papers (2022-03-19T13:39:49Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Bolstering Stochastic Gradient Descent with Model Building [0.0]
gradient descent method and its variants constitute the core optimization algorithms that achieve good convergence rates.
We propose an alternative approach to line search by using a new algorithm based on forward step model building.
We show that the proposed algorithm achieves faster convergence and better generalization in well-known test problems.
arXiv Detail & Related papers (2021-11-13T06:54:36Z) - Fast and Robust Online Inference with Stochastic Gradient Descent via
Random Scaling [0.9806910643086042]
We develop a new method of online inference for a vector of parameters estimated by the Polyak-Rtupper averaging procedure of gradient descent algorithms.
Our approach is fully operational with online data and is rigorously underpinned by a functional central limit theorem.
arXiv Detail & Related papers (2021-06-06T15:38:37Z) - Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate
in Gradient Descent [20.47598828422897]
We propose textit-Meta-Regularization, a novel approach for the adaptive choice of the learning rate in first-order descent methods.
Our approach modifies the objective function by adding a regularization term, and casts the joint process parameters.
arXiv Detail & Related papers (2021-04-12T13:13:34Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Better Parameter-free Stochastic Optimization with ODE Updates for
Coin-Betting [31.60239268539764]
PFSGD algorithms do not require setting learning rates while achieving optimal theoretical performance.
In this paper, we close the empirical gap with a new parameter-free algorithm based on continuous-time Coin-Betting on truncated models.
We show empirically that this new parameter-free algorithm outperforms algorithms with the "best default" learning rates and almost matches the performance of finely tuned baselines without anything to tune.
arXiv Detail & Related papers (2020-06-12T23:10:25Z) - Convergence of Online Adaptive and Recurrent Optimization Algorithms [0.0]
We prove local convergence of several notable descent algorithms used in machine learning.
We adopt an "ergodic" rather than probabilistic viewpoint, working with empirical time averages instead of probability distributions.
arXiv Detail & Related papers (2020-05-12T09:48:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.