Optimal Randomized Approximations for Matrix based Renyi's Entropy
- URL: http://arxiv.org/abs/2205.07426v1
- Date: Mon, 16 May 2022 02:24:52 GMT
- Title: Optimal Randomized Approximations for Matrix based Renyi's Entropy
- Authors: Yuxin Dong and Tieliang Gong and Shujian Yu and Chen Li
- Abstract summary: We develop random approximations for integer order $alpha$ cases and series approximations for non-integer $alpha$ cases.
Large-scale simulations and real-world applications validate the effectiveness of the developed approximations.
- Score: 16.651155375441796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Matrix-based Renyi's entropy enables us to directly measure information
quantities from given data without the costly probability density estimation of
underlying distributions, thus has been widely adopted in numerous statistical
learning and inference tasks. However, exactly calculating this new information
quantity requires access to the eigenspectrum of a semi-positive definite (SPD)
matrix $A$ which grows linearly with the number of samples $n$, resulting in a
$O(n^3)$ time complexity that is prohibitive for large-scale applications. To
address this issue, this paper takes advantage of stochastic trace
approximations for matrix-based Renyi's entropy with arbitrary $\alpha \in R^+$
orders, lowering the complexity by converting the entropy approximation to a
matrix-vector multiplication problem. Specifically, we develop random
approximations for integer order $\alpha$ cases and polynomial series
approximations (Taylor and Chebyshev) for non-integer $\alpha$ cases, leading
to a $O(n^2sm)$ overall time complexity, where $s,m \ll n$ denote the number of
vector queries and the polynomial order respectively. We theoretically
establish statistical guarantees for all approximation algorithms and give
explicit order of s and m with respect to the approximation error
$\varepsilon$, showing optimal convergence rate for both parameters up to a
logarithmic factor. Large-scale simulations and real-world applications
validate the effectiveness of the developed approximations, demonstrating
remarkable speedup with negligible loss in accuracy.
Related papers
- Polynomial-Time Solutions for ReLU Network Training: A Complexity
Classification via Max-Cut and Zonotopes [70.52097560486683]
We prove that the hardness of approximation of ReLU networks not only mirrors the complexity of the Max-Cut problem but also, in certain special cases, exactly corresponds to it.
In particular, when $epsilonleqsqrt84/83-1approx 0.006$, we show that it is NP-hard to find an approximate global dataset of the ReLU network objective with relative error $epsilon$ with respect to the objective value.
arXiv Detail & Related papers (2023-11-18T04:41:07Z) - Fast Minimization of Expected Logarithmic Loss via Stochastic Dual
Averaging [8.990961435218544]
We propose a first-order algorithm named $B$-sample dual averaging with the logarithmic barrier.
For the Poisson inverse problem, our algorithm attains an $varepsilon$ solution in $smashtildeO(d3/varepsilon2)$ time.
When computing the maximum-likelihood estimate for quantum state tomography, our algorithm yields an $varepsilon$-optimal solution in $smashtildeO(d3/varepsilon2)$ time.
arXiv Detail & Related papers (2023-11-05T03:33:44Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Low-complexity subspace-descent over symmetric positive definite
manifold [9.346050098365648]
We develop low-complexity algorithms for the minimization of functions over the symmetric positive definite (SPD) manifold.
The proposed approach utilizes carefully chosen subspaces that allow the update to be written as a product of the Cholesky factor of the iterate and a sparse matrix.
arXiv Detail & Related papers (2023-05-03T11:11:46Z) - The Hypervolume Indicator Hessian Matrix: Analytical Expression,
Computational Time Complexity, and Sparsity [4.523133864190258]
This paper establishes the analytical expression of the Hessian matrix of the mapping from a (fixed size) collection of $n$ points in the $d$-dimensional decision space to the scalar hypervolume indicator value.
The Hessian matrix plays a crucial role in second-order methods, such as the Newton-Raphson optimization method, and it can be used for the verification of local optimal sets.
arXiv Detail & Related papers (2022-11-08T11:24:18Z) - Faster Algorithms and Constant Lower Bounds for the Worst-Case Expected
Error [0.3997680012976965]
The goal is to design estimators that minimize the worst-case expected error.
Chen, Valiant and Valiant show that, when data values are $ell_infty$-normalized, there is a time algorithm to compute an estimator for the mean.
In this paper we design provably efficient algorithms for approximating the optimal semilinear estimator based on online convex optimization.
arXiv Detail & Related papers (2021-12-27T18:47:25Z) - Computationally Efficient Approximations for Matrix-based Renyi's
Entropy [33.72108955447222]
Recently developed matrix based Renyi's entropy enables measurement of information in data.
computation of such quantity involves the trace operator on a PSD matrix $G$ to power $alpha$(i.e., $tr(Galpha)$.
We present computationally efficient approximations to this new entropy functional that can reduce its complexity to even significantly less than $O(n2)$.
arXiv Detail & Related papers (2021-12-27T14:59:52Z) - Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations.
In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z) - Hutch++: Optimal Stochastic Trace Estimation [75.45968495410048]
We introduce a new randomized algorithm, Hutch++, which computes a $(1 pm epsilon)$ approximation to $tr(A)$ for any positive semidefinite (PSD) $A$.
We show that it significantly outperforms Hutchinson's method in experiments.
arXiv Detail & Related papers (2020-10-19T16:45:37Z) - SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm [64.13217062232874]
SURF is an algorithm for approximating distributions by piecewises.
It outperforms state-of-the-art algorithms in experiments.
arXiv Detail & Related papers (2020-02-22T01:03:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.