Improved Subsampled Randomized Hadamard Transform for Linear SVM
- URL: http://arxiv.org/abs/2002.01628v1
- Date: Wed, 5 Feb 2020 04:09:23 GMT
- Title: Improved Subsampled Randomized Hadamard Transform for Linear SVM
- Authors: Zijian Lei and Liang Lan
- Abstract summary: We propose importance sampling and deterministic top-$r$ sampling to produce effective low-dimensional embedding instead of uniform sampling SRHT.
Our experimental results have demonstrated that our proposed methods can achieve higher classification accuracies than SRHT and other random projection methods on six real-life datasets.
- Score: 18.52747917850984
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Subsampled Randomized Hadamard Transform (SRHT), a popular random projection
method that can efficiently project a $d$-dimensional data into $r$-dimensional
space ($r \ll d$) in $O(dlog(d))$ time, has been widely used to address the
challenge of high-dimensionality in machine learning. SRHT works by rotating
the input data matrix $\mathbf{X} \in \mathbb{R}^{n \times d}$ by Randomized
Walsh-Hadamard Transform followed with a subsequent uniform column sampling on
the rotated matrix. Despite the advantages of SRHT, one limitation of SRHT is
that it generates the new low-dimensional embedding without considering any
specific properties of a given dataset. Therefore, this data-independent random
projection method may result in inferior and unstable performance when used for
a particular machine learning task, e.g., classification. To overcome this
limitation, we analyze the effect of using SRHT for random projection in the
context of linear SVM classification. Based on our analysis, we propose
importance sampling and deterministic top-$r$ sampling to produce effective
low-dimensional embedding instead of uniform sampling SRHT. In addition, we
also proposed a new supervised non-uniform sampling method. Our experimental
results have demonstrated that our proposed methods can achieve higher
classification accuracies than SRHT and other random projection methods on six
real-life datasets.
Related papers
- Inference in Randomized Least Squares and PCA via Normality of Quadratic Forms [19.616162116973637]
We develop a unified methodology for statistical inference via randomized sketching or projections.
The methodology applies to fixed datasets -- i.e., is data-conditional -- and the only randomness is due to the randomized algorithm.
arXiv Detail & Related papers (2024-04-01T04:35:44Z) - Optimal Projections for Discriminative Dictionary Learning using the JL-lemma [0.5461938536945723]
Dimensionality reduction-based dictionary learning methods have often used iterative random projections.
This paper proposes a constructive approach to derandomize the projection matrix using the Johnson-Lindenstrauss lemma.
arXiv Detail & Related papers (2023-08-27T02:59:59Z) - Linearly-scalable learning of smooth low-dimensional patterns with
permutation-aided entropic dimension reduction [0.0]
In many data science applications, the objective is to extract appropriately-ordered smooth low-dimensional data patterns from high-dimensional data sets.
We show that when selecting the Euclidean smoothness as a pattern quality criterium, both of these problems can be efficiently solved numerically.
arXiv Detail & Related papers (2023-06-17T08:03:24Z) - Sharp-SSL: Selective high-dimensional axis-aligned random projections
for semi-supervised learning [16.673022545571566]
We propose a new method for high-dimensional semi-supervised learning problems.
It is based on the careful aggregation of the results of a low-dimensional procedure applied to many axis-aligned random projections of the data.
arXiv Detail & Related papers (2023-04-18T17:49:02Z) - Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD)
Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z) - Non-Adaptive Adaptive Sampling on Turnstile Streams [57.619901304728366]
We give the first relative-error algorithms for column subset selection, subspace approximation, projective clustering, and volume on turnstile streams that use space sublinear in $n$.
Our adaptive sampling procedure has a number of applications to various data summarization problems that either improve state-of-the-art or have only been previously studied in the more relaxed row-arrival model.
arXiv Detail & Related papers (2020-04-23T05:00:21Z) - Stable Sparse Subspace Embedding for Dimensionality Reduction [9.033485533901658]
This paper builds a stable sparse subspace embedded matrix based on random sampling without replacement in statistics.
It is proved that the S-SSE is stabler than the existing matrix, and it can maintain Euclidean distance between points well after dimension reduction.
arXiv Detail & Related papers (2020-02-07T15:30:22Z) - Optimal Iterative Sketching with the Subsampled Randomized Hadamard
Transform [64.90148466525754]
We study the performance of iterative sketching for least-squares problems.
We show that the convergence rate for Haar and randomized Hadamard matrices are identical, andally improve upon random projections.
These techniques may be applied to other algorithms that employ randomized dimension reduction.
arXiv Detail & Related papers (2020-02-03T16:17:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.