Related papers: Efficient estimation of the ANOVA mean dimension, with an application to neural net classification

Efficient estimation of the ANOVA mean dimension, with an application to neural net classification

URL: http://arxiv.org/abs/2007.01281v4
Date: Wed, 30 Dec 2020 20:54:40 GMT
Title: Efficient estimation of the ANOVA mean dimension, with an application to neural net classification
Authors: Christopher Hoyt and Art B. Owen
Abstract summary: Mean dimension of a black box function of $d$ variables is written as the sum of $d$ Sobol' indices that can be estimated by leave one out methods. We compare the variance of these leave one out methods: a Gibbs sampler called winding stairs, a radial sampler that changes each variable one at a time from a baseline, and a naive sampler that never reuses function evaluations.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The mean dimension of a black box function of $d$ variables is a convenient way to summarize the extent to which it is dominated by high or low order interactions. It is expressed in terms of $2^d-1$ variance components but it can be written as the sum of $d$ Sobol' indices that can be estimated by leave one out methods. We compare the variance of these leave one out methods: a Gibbs sampler called winding stairs, a radial sampler that changes each variable one at a time from a baseline, and a naive sampler that never reuses function evaluations and so costs about double the other methods. For an additive function the radial and winding stairs are most efficient. For a multiplicative function the naive method can easily be most efficient if the factors have high kurtosis. As an illustration we consider the mean dimension of a neural network classifier of digits from the MNIST data set. The classifier is a function of $784$ pixels. For that problem, winding stairs is the best algorithm. We find that inputs to the final softmax layer have mean dimensions ranging from $1.35$ to $2.0$.

Related papers

Scalable 3D Registration via Truncated Entry-wise Absolute Residuals [65.04922801371363]
A $3$D registration approach can process more than ten million ($107$) point pairs with over $99%$ random outliers. We call our method TEAR, as it involves minimizing an outlier-robust loss that computes Truncated Entry-wise Absolute Residuals.
arXiv Detail & Related papers (2024-04-01T04:43:39Z)
Near-Optimal Mean Estimation with Unknown, Heteroskedastic Variances [15.990720051907864]
Subset-of-Signals model serves as a benchmark for heteroskedastic mean estimation. Our algorithm resolves this open question up to logarithmic factors. Even for $d=2$, our techniques enable rates comparable to knowing the variance of each sample.
arXiv Detail & Related papers (2023-12-05T01:13:10Z)
Solving multiscale elliptic problems by sparse radial basis function neural networks [3.5297361401370044]
We propose a sparse radial basis function neural network method to solve elliptic partial differential equations (PDEs) with multiscale coefficients. Inspired by the deep mixed residual method, we rewrite the second-order problem into a first-order system and employ multiple radial basis function neural networks (RBFNNs) to approximate unknown functions in the system. The accuracy and effectiveness of the proposed method are demonstrated through a collection of multiscale problems with scale separation, discontinuity and multiple scales from one to three dimensions.
arXiv Detail & Related papers (2023-09-01T15:11:34Z)
On Orderings of Probability Vectors and Unsupervised Performance Estimation [6.2163687973613495]
We show that the $Linfty$ norm is the most appropriate score function for classification problems. We conduct experiments on well-known NLP data sets and rigorously explore the performance of different score functions.
arXiv Detail & Related papers (2023-06-16T20:03:16Z)
Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle. We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z)
Fundamental tradeoffs between memorization and robustness in random features and neural tangent regimes [15.76663241036412]
We prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded. Experiments reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.
arXiv Detail & Related papers (2021-06-04T17:52:50Z)
Variance-Aware Confidence Set: Variance-Dependent Bound for Linear Bandits and Horizon-Free Bound for Linear Mixture MDP [76.94328400919836]
We show how to construct variance-aware confidence sets for linear bandits and linear mixture Decision Process (MDP) For linear bandits, we obtain an $widetildeO(mathrmpoly(d)sqrt1 + sum_i=1Ksigma_i2) regret bound, where $d is the feature dimension. For linear mixture MDP, we obtain an $widetildeO(mathrmpoly(d)sqrtK)$ regret bound, where
arXiv Detail & Related papers (2021-01-29T18:57:52Z)
Learning to extrapolate using continued fractions: Predicting the critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective. We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z)
Piecewise Linear Regression via a Difference of Convex Functions [50.89452535187813]
We present a new piecewise linear regression methodology that utilizes fitting a difference of convex functions (DC functions) to the data. We empirically validate the method, showing it to be practically implementable, and to have comparable performance to existing regression/classification methods on real-world datasets.
arXiv Detail & Related papers (2020-07-05T18:58:47Z)
FANOK: Knockoffs in Linear Time [73.5154025911318]
We describe a series of algorithms that efficiently implement Gaussian model-X knockoffs to control the false discovery rate on large scale feature selection problems. We test our methods on problems with $p$ as large as $500,000$.
arXiv Detail & Related papers (2020-06-15T21:55:34Z)
On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.