Deep Neural Networks for Nonparametric Interaction Models with Diverging
Dimension
- URL: http://arxiv.org/abs/2302.05851v1
- Date: Sun, 12 Feb 2023 04:19:39 GMT
- Title: Deep Neural Networks for Nonparametric Interaction Models with Diverging
Dimension
- Authors: Sohom Bhattacharya, Jianqing Fan and Debarghya Mukherjee
- Abstract summary: We analyze a $kth$ order nonparametric interaction model in both growing dimension scenarios ($d$ grows with $n$ but at a slower rate) and in high dimension ($d gtrsim n$)
We show that under certain standard assumptions, debiased deep neural networks achieve a minimax optimal rate both in terms of $(n, d)$.
- Score: 6.939768185086753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks have achieved tremendous success due to their
representation power and adaptation to low-dimensional structures. Their
potential for estimating structured regression functions has been recently
established in the literature. However, most of the studies require the input
dimension to be fixed and consequently ignore the effect of dimension on the
rate of convergence and hamper their applications to modern big data with high
dimensionality. In this paper, we bridge this gap by analyzing a $k^{th}$ order
nonparametric interaction model in both growing dimension scenarios ($d$ grows
with $n$ but at a slower rate) and in high dimension ($d \gtrsim n$). In the
latter case, sparsity assumptions and associated regularization are required in
order to obtain optimal rates of convergence. A new challenge in diverging
dimension setting is in calculation mean-square error, the covariance terms
among estimated additive components are an order of magnitude larger than those
of the variances and they can deteriorate statistical properties without proper
care. We introduce a critical debiasing technique to amend the problem. We show
that under certain standard assumptions, debiased deep neural networks achieve
a minimax optimal rate both in terms of $(n, d)$. Our proof techniques rely
crucially on a novel debiasing technique that makes the covariances of additive
components negligible in the mean-square error calculation. In addition, we
establish the matching lower bounds.
Related papers
- A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Sample Complexity of Nonparametric Off-Policy Evaluation on
Low-Dimensional Manifolds using Deep Networks [71.95722100511627]
We consider the off-policy evaluation problem of reinforcement learning using deep neural networks.
We show that, by choosing network size appropriately, one can leverage the low-dimensional manifold structure in the Markov decision process.
arXiv Detail & Related papers (2022-06-06T20:25:20Z) - Structure and Distribution Metric for Quantifying the Quality of
Uncertainty: Assessing Gaussian Processes, Deep Neural Nets, and Deep Neural
Operators for Regression [0.0]
We propose two comparison metrics that may be implemented to arbitrary dimensions in regression tasks.
The structure metric assesses the similarity in shape and location of uncertainty with the true error, while the distribution metric quantifies the supported magnitudes between the two.
We apply these metrics to Gaussian Processes (GPs), Ensemble Deep Neural Nets (DNNs), and Ensemble Deep Neural Operators (DNOs) on high-dimensional and nonlinear test cases.
arXiv Detail & Related papers (2022-03-09T04:16:31Z) - Neural Estimation of Statistical Divergences [24.78742908726579]
A modern method for estimating statistical divergences relies on parametrizing an empirical variational form by a neural network (NN)
In particular, there is a fundamental tradeoff between the two sources of error involved: approximation and empirical estimation.
We show that neural estimators with a slightly different NN growth-rate are near minimax rate-optimal, achieving the parametric convergence rate up to logarithmic factors.
arXiv Detail & Related papers (2021-10-07T17:42:44Z) - The Rate of Convergence of Variation-Constrained Deep Neural Networks [35.393855471751756]
We show that a class of variation-constrained neural networks can achieve near-parametric rate $n-1/2+delta$ for an arbitrarily small constant $delta$.
The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived.
arXiv Detail & Related papers (2021-06-22T21:28:00Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in
High Dimensions [41.7567932118769]
Empirical Risk Minimization algorithms are widely used in a variety of estimation and prediction tasks.
In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference.
arXiv Detail & Related papers (2020-06-16T04:27:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.