Efficient estimation of the ANOVA mean dimension, with an application to
neural net classification
- URL: http://arxiv.org/abs/2007.01281v4
- Date: Wed, 30 Dec 2020 20:54:40 GMT
- Title: Efficient estimation of the ANOVA mean dimension, with an application to
neural net classification
- Authors: Christopher Hoyt and Art B. Owen
- Abstract summary: Mean dimension of a black box function of $d$ variables is written as the sum of $d$ Sobol' indices that can be estimated by leave one out methods.
We compare the variance of these leave one out methods: a Gibbs sampler called winding stairs, a radial sampler that changes each variable one at a time from a baseline, and a naive sampler that never reuses function evaluations.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The mean dimension of a black box function of $d$ variables is a convenient
way to summarize the extent to which it is dominated by high or low order
interactions. It is expressed in terms of $2^d-1$ variance components but it
can be written as the sum of $d$ Sobol' indices that can be estimated by leave
one out methods. We compare the variance of these leave one out methods: a
Gibbs sampler called winding stairs, a radial sampler that changes each
variable one at a time from a baseline, and a naive sampler that never reuses
function evaluations and so costs about double the other methods. For an
additive function the radial and winding stairs are most efficient. For a
multiplicative function the naive method can easily be most efficient if the
factors have high kurtosis. As an illustration we consider the mean dimension
of a neural network classifier of digits from the MNIST data set. The
classifier is a function of $784$ pixels. For that problem, winding stairs is
the best algorithm. We find that inputs to the final softmax layer have mean
dimensions ranging from $1.35$ to $2.0$.
Related papers
- Scalable 3D Registration via Truncated Entry-wise Absolute Residuals [65.04922801371363]
A $3$D registration approach can process more than ten million ($107$) point pairs with over $99%$ random outliers.
We call our method TEAR, as it involves minimizing an outlier-robust loss that computes Truncated Entry-wise Absolute Residuals.
arXiv Detail & Related papers (2024-04-01T04:43:39Z) - Near-Optimal Mean Estimation with Unknown, Heteroskedastic Variances [15.990720051907864]
Subset-of-Signals model serves as a benchmark for heteroskedastic mean estimation.
Our algorithm resolves this open question up to logarithmic factors.
Even for $d=2$, our techniques enable rates comparable to knowing the variance of each sample.
arXiv Detail & Related papers (2023-12-05T01:13:10Z) - Solving multiscale elliptic problems by sparse radial basis function
neural networks [3.5297361401370044]
We propose a sparse radial basis function neural network method to solve elliptic partial differential equations (PDEs) with multiscale coefficients.
Inspired by the deep mixed residual method, we rewrite the second-order problem into a first-order system and employ multiple radial basis function neural networks (RBFNNs) to approximate unknown functions in the system.
The accuracy and effectiveness of the proposed method are demonstrated through a collection of multiscale problems with scale separation, discontinuity and multiple scales from one to three dimensions.
arXiv Detail & Related papers (2023-09-01T15:11:34Z) - On Orderings of Probability Vectors and Unsupervised Performance
Estimation [6.2163687973613495]
We show that the $Linfty$ norm is the most appropriate score function for classification problems.
We conduct experiments on well-known NLP data sets and rigorously explore the performance of different score functions.
arXiv Detail & Related papers (2023-06-16T20:03:16Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - Fundamental tradeoffs between memorization and robustness in random
features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded.
Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z) - Variance-Aware Confidence Set: Variance-Dependent Bound for Linear
Bandits and Horizon-Free Bound for Linear Mixture MDP [76.94328400919836]
We show how to construct variance-aware confidence sets for linear bandits and linear mixture Decision Process (MDP)
For linear bandits, we obtain an $widetildeO(mathrmpoly(d)sqrt1 + sum_i=1Ksigma_i2) regret bound, where $d is the feature dimension.
For linear mixture MDP, we obtain an $widetildeO(mathrmpoly(d)sqrtK)$ regret bound, where
arXiv Detail & Related papers (2021-01-29T18:57:52Z) - Learning to extrapolate using continued fractions: Predicting the
critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective.
We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z) - Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data.
We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z) - FANOK: Knockoffs in Linear Time [73.5154025911318]
We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems.
We test our methods on problems with $p$ as large as $500,000$.
arXiv Detail & Related papers (2020-06-15T21:55:34Z) - On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.