Related papers: Learning Large Scale Sparse Models

Learning Large Scale Sparse Models

URL: http://arxiv.org/abs/2301.10958v1
Date: Thu, 26 Jan 2023 06:29:49 GMT
Title: Learning Large Scale Sparse Models
Authors: Atul Dhingra, Jie Shen, Nicholas Kleene
Abstract summary: We consider learning sparse models in large scale settings, where the number of samples and the feature dimension can grow as large as millions or billions. We propose to learn sparse models such as Lasso in an online manner where in each, only one randomly chosen sample is revealed to update a sparse gradient. Thereby, the memory cost is independent of the sample size and gradient evaluation for one sample is efficient.
Score: 6.428186644949941
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we consider learning sparse models in large scale settings, where the number of samples and the feature dimension can grow as large as millions or billions. Two immediate issues occur under such challenging scenario: (i) computational cost; (ii) memory overhead. In particular, the memory issue precludes a large volume of prior algorithms that are based on batch optimization technique. To remedy the problem, we propose to learn sparse models such as Lasso in an online manner where in each iteration, only one randomly chosen sample is revealed to update a sparse iterate. Thereby, the memory cost is independent of the sample size and gradient evaluation for one sample is efficient. Perhaps amazingly, we find that with the same parameter, sparsity promoted by batch methods is not preserved in online fashion. We analyze such interesting phenomenon and illustrate some effective variants including mini-batch methods and a hard thresholding based stochastic gradient algorithm. Extensive experiments are carried out on a public dataset which supports our findings and algorithms.

Related papers

Scalable Subset Selection in Linear Mixed Models [0.39373541926236766]
Linear models (LMMs) are key tools for analyzing heterogeneous data, such as in personalized medicine.<n>Existing methods for LMMs do not scale well beyond tens or hundreds of predictors.<n>New method for LMM subset selection can run on datasets containing thousands predictors in seconds to minutes.
arXiv Detail & Related papers (2025-06-25T13:39:30Z)
Worth Their Weight: Randomized and Regularized Block Kaczmarz Algorithms without Preprocessing [2.542838926315103]
We analyze the convergence of the randomized block Kaczmarz method (RBK) when the data is sampled uniformly, showing that its iterates converge in a Monte Carlo sense to a $textit$ least-squares solution. We resolve these issues by incorporating regularization into the RBK. Numerical experiments, including examples arising from natural gradient optimization, suggest that the regularized algorithm, ReBlocK, outperforms minibatch gradient descent for realistic problems that exhibit fast singular value decay.
arXiv Detail & Related papers (2025-02-02T19:23:46Z)
Best-Subset Selection in Generalized Linear Models: A Fast and Consistent Algorithm via Splicing Technique [0.6338047104436422]
Best subset section has been widely regarded as the Holy Grail of problems of this type. We proposed and illustrated an algorithm for best subset recovery in mild conditions. Our implementation achieves approximately a fourfold speedup compared to popular variable selection toolkits.
arXiv Detail & Related papers (2023-08-01T03:11:31Z)
Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z)
Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization [23.271890743506514]
We propose a doubly algorithm with a novel accelerating multimomentum technique to solve large scale empirical risk minimization problem for learning tasks. While enjoying a provably superior convergence rate, in each iteration, such algorithm only accesses a mini batch of samples and updates a small block of variable coordinates.
arXiv Detail & Related papers (2023-04-23T14:21:29Z)
Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances. Online descent (OGD) is a popular approach to handle streaming data in pairwise learning. In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z)
SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory. We propose StreaMRAK - a streaming version of KRR. We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z)
Batch Active Learning at Scale [39.26441165274027]
Batch active learning, which adaptively issues batched queries to a labeling oracle, is a common approach for addressing this problem. In this work, we analyze an efficient active learning algorithm, which focuses on the large batch setting. We show that our sampling method, which combines notions of uncertainty and diversity, easily scales to batch sizes (100K-1M) several orders of magnitude larger than used in previous studies.
arXiv Detail & Related papers (2021-07-29T18:14:05Z)
Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment. Policy gradients for local search are often obtained from random perturbations. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z)
Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and Personalized Federated Learning [56.17603785248675]
Model-agnostic meta-learning (MAML) has become a popular research area. Existing MAML algorithms rely on the episode' idea by sampling a few tasks and data points to update the meta-model at each iteration. This paper proposes memory-based algorithms for MAML that converge with vanishing error.
arXiv Detail & Related papers (2021-06-09T08:47:58Z)
DPER: Efficient Parameter Estimation for Randomly Missing Data [0.24466725954625884]
We propose novel algorithms to find the maximum likelihood estimates (MLEs) for a one-class/multiple-class randomly missing data set. Our algorithms do not require multiple iterations through the data, thus promising to be less time-consuming than other methods.
arXiv Detail & Related papers (2021-06-06T16:37:48Z)
Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction. We adaptively select the descent steps where the measure reduction is carried out. We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z)
Efficient Algorithms for Multidimensional Segmented Regression [42.046881924063044]
We study the fundamental problem of fixed design em multidimensional regression. We provide the first sample and computationally efficient algorithm for this problem in any fixed dimension. Our algorithm relies on a simple merging iterative approach, which is novel in the multidimensional setting.
arXiv Detail & Related papers (2020-03-24T19:39:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.