Related papers: Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization

Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization

URL: http://arxiv.org/abs/2304.11665v1
Date: Sun, 23 Apr 2023 14:21:29 GMT
Title: Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization
Authors: Zebang Shen, Hui Qian, Tongzhou Mu, Chao Zhang
Abstract summary: We propose a doubly algorithm with a novel accelerating multimomentum technique to solve large scale empirical risk minimization problem for learning tasks. While enjoying a provably superior convergence rate, in each iteration, such algorithm only accesses a mini batch of samples and updates a small block of variable coordinates.
Score: 23.271890743506514
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Nowadays, algorithms with fast convergence, small memory footprints, and low per-iteration complexity are particularly favorable for artificial intelligence applications. In this paper, we propose a doubly stochastic algorithm with a novel accelerating multi-momentum technique to solve large scale empirical risk minimization problem for learning tasks. While enjoying a provably superior convergence rate, in each iteration, such algorithm only accesses a mini batch of samples and meanwhile updates a small block of variable coordinates, which substantially reduces the amount of memory reference when both the massive sample size and ultra-high dimensionality are involved. Empirical studies on huge scale datasets are conducted to illustrate the efficiency of our method in practice.

Related papers

Fast sparse optimization via adaptive shrinkage [0.6226609932118122]
We develop a proximal method, based on logarithmic regularization, which turns out to be an iterative shrinkage-thresholding algorithm. This adaptivity substantially enhances the trajectory of the algorithm, in a way that yields faster convergence. We validate its fast convergence via numerical experiments and we discuss the performance with respect to state-of-the-art algorithms.
arXiv Detail & Related papers (2025-01-21T15:58:21Z)
Robust Clustering on High-Dimensional Data with Stochastic Quantization [0.0]
This paper addresses the limitations of conventional vector quantization algorithms. It investigates the Quantization (SQ) as an alternative for high-dimensionality computation.
arXiv Detail & Related papers (2024-09-03T17:13:55Z)
Randomized Dimension Reduction with Statistical Guarantees [0.27195102129095]
This thesis explores some of such algorithms for fast execution and efficient data utilization. We focus on learning algorithms with various incorporations of data augmentation that improve generalization and distributional provably. Specifically, Chapter 4 presents a sample complexity analysis for data augmentation consistency regularization.
arXiv Detail & Related papers (2023-10-03T02:01:39Z)
Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z)
Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity. We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level. Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z)
Learning Large Scale Sparse Models [6.428186644949941]
We consider learning sparse models in large scale settings, where the number of samples and the feature dimension can grow as large as millions or billions. We propose to learn sparse models such as Lasso in an online manner where in each, only one randomly chosen sample is revealed to update a sparse gradient. Thereby, the memory cost is independent of the sample size and gradient evaluation for one sample is efficient.
arXiv Detail & Related papers (2023-01-26T06:29:49Z)
Faster One-Sample Stochastic Conditional Gradient Method for Composite Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms. The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z)
SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory. We propose StreaMRAK - a streaming version of KRR. We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z)
Stochastic Hard Thresholding Algorithms for AUC Maximization [49.00683387735522]
We develop a hard thresholding algorithm for AUC in distributiond classification. We conduct experiments to show the efficiency and effectiveness of the proposed algorithms.
arXiv Detail & Related papers (2020-11-04T16:49:29Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
SONIA: A Symmetric Blockwise Truncated Optimization Algorithm [2.9923891863939938]
This work presents a new algorithm for empirical risk. The algorithm bridges the gap between first- and second-order search methods by computing a second-order search-type update in one subspace, coupled with a scaled steepest descent step in the Theoretical complement.
arXiv Detail & Related papers (2020-06-06T19:28:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.