Fast Projection onto the Capped Simplex withApplications to Sparse
Regression in Bioinformatics
- URL: http://arxiv.org/abs/2110.08471v2
- Date: Tue, 19 Oct 2021 01:42:06 GMT
- Title: Fast Projection onto the Capped Simplex withApplications to Sparse
Regression in Bioinformatics
- Authors: Andersen Ang, Jianzhu Ma, Nianjun Liu, Kun Huang, Yijie Wang
- Abstract summary: We find that a simple algorithm based on Newton's method is able to solve the projection problem to high precision.
We demonstrate that the proposed algorithm can produce a solution of the projection problem with high precision on large scale datasets.
- Score: 6.6371396313325075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of projecting a vector onto the so-called k-capped
simplex, which is a hyper-cube cut by a hyperplane. For an n-dimensional input
vector with bounded elements, we found that a simple algorithm based on
Newton's method is able to solve the projection problem to high precision with
a complexity roughly about O(n), which has a much lower computational cost
compared with the existing sorting-based methods proposed in the literature. We
provide a theory for partial explanation and justification of the method.
We demonstrate that the proposed algorithm can produce a solution of the
projection problem with high precision on large scale datasets, and the
algorithm is able to significantly outperform the state-of-the-art methods in
terms of runtime (about 6-8 times faster than a commercial software with
respect to CPU time for input vector with 1 million variables or more).
We further illustrate the effectiveness of the proposed algorithm on solving
sparse regression in a bioinformatics problem. Empirical results on the GWAS
dataset (with 1,500,000 single-nucleotide polymorphisms) show that, when using
the proposed method to accelerate the Projected Quasi-Newton (PQN) method, the
accelerated PQN algorithm is able to handle huge-scale regression problem and
it is more efficient (about 3-6 times faster) than the current state-of-the-art
methods.
Related papers
- Randomized Algorithms for Symmetric Nonnegative Matrix Factorization [2.1753766244387402]
Symmetric Nonnegative Matrix Factorization (SymNMF) is a technique in data analysis and machine learning.
We develop two randomized algorithms for its computation.
We show that our methods approximately maintain solution quality and achieve significant speed ups for both large dense and large sparse problems.
arXiv Detail & Related papers (2024-02-13T00:02:05Z) - Fast, Scalable, Warm-Start Semidefinite Programming with Spectral
Bundling and Sketching [53.91395791840179]
We present Unified Spectral Bundling with Sketching (USBS), a provably correct, fast and scalable algorithm for solving massive SDPs.
USBS provides a 500x speed-up over the state-of-the-art scalable SDP solver on an instance with over 2 billion decision variables.
arXiv Detail & Related papers (2023-12-19T02:27:22Z) - Efficient Numerical Algorithm for Large-Scale Damped Natural Gradient
Descent [7.368877979221163]
We propose a new algorithm for efficiently solving the damped Fisher matrix in large-scale scenarios where the number of parameters significantly exceeds the number of available samples.
Our algorithm is based on Cholesky decomposition and is generally applicable. Benchmark results show that the algorithm is significantly faster than existing methods.
arXiv Detail & Related papers (2023-10-26T16:46:13Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Fast Projected Newton-like Method for Precision Matrix Estimation under
Total Positivity [15.023842222803058]
Current algorithms are designed using the block coordinate descent method or the proximal point algorithm.
We propose a novel algorithm based on the two-metric projection method, incorporating a carefully designed search direction and variable partitioning scheme.
Experimental results on synthetic and real-world datasets demonstrate that our proposed algorithm provides a significant improvement in computational efficiency compared to the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-03T14:39:10Z) - Analysis of Truncated Orthogonal Iteration for Sparse Eigenvector
Problems [78.95866278697777]
We propose two variants of the Truncated Orthogonal Iteration to compute multiple leading eigenvectors with sparsity constraints simultaneously.
We then apply our algorithms to solve the sparse principle component analysis problem for a wide range of test datasets.
arXiv Detail & Related papers (2021-03-24T23:11:32Z) - Slowly Varying Regression under Sparsity [5.22980614912553]
We present the framework of slowly hyper regression under sparsity, allowing regression models to exhibit slow and sparse variations.
We suggest a procedure that reformulates as a binary convex algorithm.
We show that the resulting model outperforms competing formulations in comparable times across various datasets.
arXiv Detail & Related papers (2021-02-22T04:51:44Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Accelerated Message Passing for Entropy-Regularized MAP Inference [89.15658822319928]
Maximum a posteriori (MAP) inference in discrete-valued random fields is a fundamental problem in machine learning.
Due to the difficulty of this problem, linear programming (LP) relaxations are commonly used to derive specialized message passing algorithms.
We present randomized methods for accelerating these algorithms by leveraging techniques that underlie classical accelerated gradient.
arXiv Detail & Related papers (2020-07-01T18:43:32Z) - Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems.
We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms.
Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.