High-Dimensional Smoothed Entropy Estimation via Dimensionality
Reduction
- URL: http://arxiv.org/abs/2305.04712v2
- Date: Thu, 11 May 2023 14:47:24 GMT
- Title: High-Dimensional Smoothed Entropy Estimation via Dimensionality
Reduction
- Authors: Kristjan Greenewald, Brian Kingsbury, Yuancheng Yu
- Abstract summary: We consider the estimation of the differential entropy $h(X+Z)$ via $n$ independently and identically distributed samples of $X$.
Under the absolute-error loss, the above problem has a parametric estimation rate of $fraccDsqrtn$.
We overcome this exponential sample complexity by projecting $X$ to a low-dimensional space via principal component analysis (PCA) before the entropy estimation.
- Score: 14.53979700025531
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of overcoming exponential sample complexity in
differential entropy estimation under Gaussian convolutions. Specifically, we
consider the estimation of the differential entropy $h(X+Z)$ via $n$
independently and identically distributed samples of $X$, where $X$ and $Z$ are
independent $D$-dimensional random variables with $X$ sub-Gaussian with bounded
second moment and $Z\sim\mathcal{N}(0,\sigma^2I_D)$. Under the absolute-error
loss, the above problem has a parametric estimation rate of
$\frac{c^D}{\sqrt{n}}$, which is exponential in data dimension $D$ and often
problematic for applications. We overcome this exponential sample complexity by
projecting $X$ to a low-dimensional space via principal component analysis
(PCA) before the entropy estimation, and show that the asymptotic error
overhead vanishes as the unexplained variance of the PCA vanishes. This implies
near-optimal performance for inherently low-dimensional structures embedded in
high-dimensional spaces, including hidden-layer outputs of deep neural networks
(DNN), which can be used to estimate mutual information (MI) in DNNs. We
provide numerical results verifying the performance of our PCA approach on
Gaussian and spiral data. We also apply our method to analysis of information
flow through neural network layers (c.f. information bottleneck), with results
measuring mutual information in a noisy fully connected network and a noisy
convolutional neural network (CNN) for MNIST classification.
Related papers
- Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods [43.32546195968771]
We study the data-dependent convergence and generalization behavior of gradient methods for neural networks with smooth activation.
Our results improve upon the shortcomings of the well-established Rademacher complexity-based bounds.
We show that a large step-size significantly improves upon the NTK regime's results in classifying the XOR distribution.
arXiv Detail & Related papers (2024-10-13T21:49:29Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent [83.85536329832722]
We show that gradient descent (SGD) can efficiently solve the $k$-parity problem on a $d$dimensional hypercube.
We then demonstrate how a trained neural network with SGD, solving the $k$-parity problem with small statistical errors.
arXiv Detail & Related papers (2024-04-18T17:57:53Z) - Sliding down the stairs: how correlated latent variables accelerate learning with neural networks [8.107431208836426]
We show that correlations between latent variables along directions encoded in different input cumulants speed up learning from higher-order correlations.
Our results are confirmed in simulations of two-layer neural networks.
arXiv Detail & Related papers (2024-04-12T17:01:25Z) - Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Sample Complexity of Neural Policy Mirror Descent for Policy
Optimization on Low-Dimensional Manifolds [75.51968172401394]
We study the sample complexity of the neural policy mirror descent (NPMD) algorithm with deep convolutional neural networks (CNN)
In each iteration of NPMD, both the value function and the policy can be well approximated by CNNs.
We show that NPMD can leverage the low-dimensional structure of state space to escape from the curse of dimensionality.
arXiv Detail & Related papers (2023-09-25T07:31:22Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - Higher Order Gauge Equivariant CNNs on Riemannian Manifolds and
Applications [7.322121417864824]
We introduce a higher order generalization of the gauge equivariant convolution, dubbed a gauge equivariant Volterra network (GEVNet)
This allows us to model spatially extended nonlinear interactions within a given field while still maintaining equivariance to global isometries.
In the neuroimaging data experiments, the resulting two-part architecture is used to automatically discriminate between patients with Lewy Body Disease (DLB), Alzheimer's Disease (AD) and Parkinson's Disease (PD) from diffusion magnetic resonance images (dMRI)
arXiv Detail & Related papers (2023-05-26T06:02:31Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Fundamental tradeoffs between memorization and robustness in random
features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded.
Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.