BEAR: Sketching BFGS Algorithm for Ultra-High Dimensional Feature
Selection in Sublinear Memory
- URL: http://arxiv.org/abs/2010.13829v2
- Date: Wed, 26 May 2021 17:24:37 GMT
- Title: BEAR: Sketching BFGS Algorithm for Ultra-High Dimensional Feature
Selection in Sublinear Memory
- Authors: Amirali Aghazadeh, Vipul Gupta, Alex DeWeese, O. Ozan Koyluoglu and
Kannan Ramchandran
- Abstract summary: Current large-scale sketching algorithms show poor memory-accuracy trade-off due to the irreversible collision and accumulation of the noise in the sketched domain.
We develop BEAR, which avoids the extra collisions by storing the second-order gradients in the celebrated Broyden-Fletcher-Goldfarb-Shannon (BFGS) algorithm.
Experiments on real-world data sets demonstrate that BEAR requires up to three orders of magnitude less memory space to achieve the same classification accuracy compared to the first-order sketching algorithms.
- Score: 13.596664481933875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider feature selection for applications in machine learning where the
dimensionality of the data is so large that it exceeds the working memory of
the (local) computing machine. Unfortunately, current large-scale sketching
algorithms show poor memory-accuracy trade-off due to the irreversible
collision and accumulation of the stochastic gradient noise in the sketched
domain. Here, we develop a second-order ultra-high dimensional feature
selection algorithm, called BEAR, which avoids the extra collisions by storing
the second-order gradients in the celebrated Broyden-Fletcher-Goldfarb-Shannon
(BFGS) algorithm in Count Sketch, a sublinear memory data structure from the
streaming literature. Experiments on real-world data sets demonstrate that BEAR
requires up to three orders of magnitude less memory space to achieve the same
classification accuracy compared to the first-order sketching algorithms.
Theoretical analysis proves convergence of BEAR with rate O(1/t) in t
iterations of the sketched algorithm. Our algorithm reveals an unexplored
advantage of second-order optimization for memory-constrained sketching of
models trained on ultra-high dimensional data sets.
Related papers
- A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions.
We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm.
Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z) - Sketch and shift: a robust decoder for compressive clustering [17.627195350266796]
Compressive learning is an emerging approach to drastically reduce the memory footprint of large-scale learning.
We propose an alternative decoder offering substantial improvements over CL-OMPR.
The proposed algorithm can extract clustering information from a sketch of the MNIST dataset that is 10 times smaller than previously.
arXiv Detail & Related papers (2023-12-15T16:53:55Z) - Eva: A General Vectorized Approximation Framework for Second-order
Optimization [16.647611352181574]
We present a memory- and time-efficient second-order algorithm named Eva with two novel techniques.
We derive an efficient update formula without explicitly computing the inverse of using the Sherman-Morrison formula.
Experiments show that Eva reduces the end-to-end training time up to 2.05x and 2.42x compared to first-order SGD and second-order algorithms.
arXiv Detail & Related papers (2023-08-04T03:51:38Z) - Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z) - SketchOGD: Memory-Efficient Continual Learning [9.372686529398859]
When machine learning models are trained continually on a sequence of tasks, they are liable to forget what they learned on previous tasks.
This paper proposes a memory-efficient solution to catastrophic forgetting, improving upon an established algorithm known as gradient descent (OGD)
arXiv Detail & Related papers (2023-05-25T18:56:19Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - Adaptive First- and Second-Order Algorithms for Large-Scale Machine
Learning [3.0204520109309843]
We consider first- and second-order techniques to address continuous optimization problems in machine learning.
In the first-order case, we propose a framework of transition from semi-deterministic to quadratic regularization methods.
In the second-order case, we propose a novel first-order algorithm with adaptive sampling and adaptive step size.
arXiv Detail & Related papers (2021-11-29T18:10:00Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - Learning the Positions in CountSketch [51.15935547615698]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work we propose the first learning algorithm that also optimize the locations of the non-zero entries.
We show this algorithm gives better accuracy for low rank approximation than previous work, and apply it to other problems such as $k$-means clustering for the first time.
arXiv Detail & Related papers (2020-07-20T05:06:29Z) - Sliced Iterative Normalizing Flows [7.6146285961466]
We develop an iterative (greedy) deep learning (DL) algorithm which is able to transform an arbitrary probability distribution function (PDF) into the target PDF.
As special cases of this algorithm, we introduce two sliced iterative Normalizing Flow (SINF) models, which map from the data to the latent space (GIS) and vice versa.
arXiv Detail & Related papers (2020-07-01T18:00:04Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.