Using Multilevel Circulant Matrix Approximate to Speed Up Kernel
Logistic Regression
- URL: http://arxiv.org/abs/2108.08605v1
- Date: Thu, 19 Aug 2021 10:30:12 GMT
- Title: Using Multilevel Circulant Matrix Approximate to Speed Up Kernel
Logistic Regression
- Authors: Junna~Zhang, Shuisheng~Zhou,~Cui~Fu and Zhuan Zhang
- Abstract summary: We employ the multilevel circulant matrix (MCM) approximate kernel matrix to save in storage space and accelerate the solution of the KLR.
Our method makes KLR scalable for large-scale problems, with less memory consumption, and converges to test accuracy without sacrifice in a shorter time.
- Score: 3.1427994341585688
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Kernel logistic regression (KLR) is a classical nonlinear classifier in
statistical machine learning. Newton method with quadratic convergence rate can
solve KLR problem more effectively than the gradient method. However, an
obvious limitation of Newton method for training large-scale problems is the
$O(n^{3})$ time complexity and $O(n^{2})$ space complexity, where $n$ is the
number of training instances. In this paper, we employ the multilevel circulant
matrix (MCM) approximate kernel matrix to save in storage space and accelerate
the solution of the KLR. Combined with the characteristics of MCM and our
ingenious design, we propose an MCM approximate Newton iterative method. We
first simplify the Newton direction according to the semi-positivity of the
kernel matrix and then perform a two-step approximation of the Newton direction
by using MCM. Our method reduces the time complexity of each iteration to $O(n
\log n)$ by using the multidimensional fast Fourier transform (mFFT). In
addition, the space complexity can be reduced to $O(n)$ due to the built-in
periodicity of MCM. Experimental results on some large-scale binary and
multi-classification problems show that our method makes KLR scalable for
large-scale problems, with less memory consumption, and converges to test
accuracy without sacrifice in a shorter time.
Related papers
- Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Large-Scale Gaussian Processes via Alternating Projection [23.79090469387859]
We propose an iterative method which only accesses subblocks of the kernel matrix, effectively enabling mini-batching.
Our algorithm, based on alternating projection, has $mathcalO(n)$ per-iteration time and space complexity, solving many of the practical challenges of scaling GPs to very large datasets.
arXiv Detail & Related papers (2023-10-26T04:20:36Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Orthogonal Directions Constrained Gradient Method: from non-linear
equality constraints to Stiefel manifold [16.099883128428054]
We propose a novel algorithm, the Orthogonal Directions Constrained Method (ODCGM)
ODCGM only requires computing a projection onto a vector space.
We show that ODCGM exhibits the near-optimal oracle complexities.
arXiv Detail & Related papers (2023-03-16T12:25:53Z) - Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems.
We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z) - FKreg: A MATLAB toolbox for fast Multivariate Kernel Regression [5.090316990822874]
We introduce a new toolbox for fast multivariate kernel regression with the idea of non-uniform FFT (NUFFT)
NUFFT implements the algorithm for $M$ gridding points with $Oleft( N+Mlog M right)$ complexity and accuracy controllability.
The bandwidth selection problem utilizes the Fast Monte-Carlo to estimate the degree of freedom.
arXiv Detail & Related papers (2022-04-16T04:52:44Z) - Fast Differentiable Matrix Square Root and Inverse Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root and the inverse square root.
For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Pad'e Approximants (MPA)
A series of numerical tests show that both methods yield considerable speed-up compared with the SVD or the NS iteration.
arXiv Detail & Related papers (2022-01-29T10:00:35Z) - Fast $L^2$ optimal mass transport via reduced basis methods for the
Monge-Amp$\grave{\rm e}$re equation [0.0]
We propose a machine learning-like method for solving the parameterized Monge-Amp$graverm e$re equation.
Several challenging numerical tests demonstrate the accuracy and high efficiency of our method for solving the Monge-Amp$graverm e$re equation.
arXiv Detail & Related papers (2021-12-03T12:30:46Z) - Mixability made efficient: Fast online multiclass logistic regression [68.8204255655161]
We show that mixability can be a powerful tool to obtain algorithms with optimal regret.
The resulting methods often suffer from high computational complexity which has reduced their practical applicability.
arXiv Detail & Related papers (2021-10-08T08:22:05Z) - Newton-LESS: Sparsification without Trade-offs for the Sketched Newton
Update [88.73437209862891]
In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration.
We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching.
We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings.
arXiv Detail & Related papers (2021-07-15T17:33:05Z) - The Fast Kernel Transform [21.001203328543006]
We propose the Fast Kernel Transform (FKT), a general algorithm to compute matrix-vector multiplications for datasets in moderate dimensions with quasilinear complexity.
The FKT is easily applied to a broad class of kernels, including Gaussian, Matern, and Rational Quadratic covariance functions and physically motivated Green's functions.
We illustrate the efficacy and versatility of the FKT by providing timing and accuracy benchmarks and by applying it to scale the neighborhood embedding (t-SNE) and Gaussian processes to large real-world data sets.
arXiv Detail & Related papers (2021-06-08T16:15:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.