Deep Neural Networks for Nonparametric Interaction Models with Diverging
  Dimension
        - URL: http://arxiv.org/abs/2302.05851v1
- Date: Sun, 12 Feb 2023 04:19:39 GMT
- Title: Deep Neural Networks for Nonparametric Interaction Models with Diverging
  Dimension
- Authors: Sohom Bhattacharya, Jianqing Fan and Debarghya Mukherjee
- Abstract summary: We analyze a $kth$ order nonparametric interaction model in both growing dimension scenarios ($d$ grows with $n$ but at a slower rate) and in high dimension ($d gtrsim n$)
We show that under certain standard assumptions, debiased deep neural networks achieve a minimax optimal rate both in terms of $(n, d)$.
- Score: 6.939768185086753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Deep neural networks have achieved tremendous success due to their
representation power and adaptation to low-dimensional structures. Their
potential for estimating structured regression functions has been recently
established in the literature. However, most of the studies require the input
dimension to be fixed and consequently ignore the effect of dimension on the
rate of convergence and hamper their applications to modern big data with high
dimensionality. In this paper, we bridge this gap by analyzing a $k^{th}$ order
nonparametric interaction model in both growing dimension scenarios ($d$ grows
with $n$ but at a slower rate) and in high dimension ($d \gtrsim n$). In the
latter case, sparsity assumptions and associated regularization are required in
order to obtain optimal rates of convergence. A new challenge in diverging
dimension setting is in calculation mean-square error, the covariance terms
among estimated additive components are an order of magnitude larger than those
of the variances and they can deteriorate statistical properties without proper
care. We introduce a critical debiasing technique to amend the problem. We show
that under certain standard assumptions, debiased deep neural networks achieve
a minimax optimal rate both in terms of $(n, d)$. Our proof techniques rely
crucially on a novel debiasing technique that makes the covariances of additive
components negligible in the mean-square error calculation. In addition, we
establish the matching lower bounds.
 
      
        Related papers
        - Adversarial learning for nonparametric regression: Minimax rate and   adaptive estimation [3.244945627960733]
 We establish the minimax rate of convergence under adversarial $L_risks with $1 leq leq infty$ and propose a piecewise local estimator that achieves the minimax optimality.<n>We construct a data-driven adaptive estimator that is shown to achieve, within a logarithmic factor, the optimal rate across a broad scale of non and adversarial classes.
 arXiv  Detail & Related papers  (2025-06-02T02:38:47Z)
- Fréchet Cumulative Covariance Net for Deep Nonlinear Sufficient   Dimension Reduction with Random Objects [22.156257535146004]
 We introduce a new statistical dependence measure termed Fr'echet Cumulative Covariance (FCCov) and develop a novel nonlinear SDR framework based on FCCov.
Our approach is not only applicable to complex non-Euclidean data, but also exhibits robustness against outliers.
We prove that our method with squared Frobenius norm regularization achieves unbiasedness at the $sigma$-field level.
 arXiv  Detail & Related papers  (2025-02-21T10:55:50Z)
- Implicit Bias in Matrix Factorization and its Explicit Realization in a   New Architecture [36.53793044674861]
 Gradient descent for matrix factorization is known to exhibit an implicit bias toward approximately low-rank solutions.
We introduce a new factorization model: $Xapprox UDVtop$, where $U$ and $V$ are constrained within norm balls, while $D$ is a diagonal factor allowing the model to span the entire search space.
 arXiv  Detail & Related papers  (2025-01-27T18:56:22Z)
- A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for   Functional Minimax Optimization [90.87444114491116]
 This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
 arXiv  Detail & Related papers  (2024-04-18T16:46:08Z)
- Learning Low Dimensional State Spaces with Overparameterized Recurrent
  Neural Nets [57.06026574261203]
 We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
 Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
 arXiv  Detail & Related papers  (2022-10-25T14:45:15Z)
- Sample Complexity of Nonparametric Off-Policy Evaluation on
  Low-Dimensional Manifolds using Deep Networks [71.95722100511627]
 We consider the off-policy evaluation problem of reinforcement learning using deep neural networks.
We show that, by choosing network size appropriately, one can leverage the low-dimensional manifold structure in the Markov decision process.
 arXiv  Detail & Related papers  (2022-06-06T20:25:20Z)
- Structure and Distribution Metric for Quantifying the Quality of
  Uncertainty: Assessing Gaussian Processes, Deep Neural Nets, and Deep Neural
  Operators for Regression [0.0]
 We propose two comparison metrics that may be implemented to arbitrary dimensions in regression tasks.
The structure metric assesses the similarity in shape and location of uncertainty with the true error, while the distribution metric quantifies the supported magnitudes between the two.
We apply these metrics to Gaussian Processes (GPs), Ensemble Deep Neural Nets (DNNs), and Ensemble Deep Neural Operators (DNOs) on high-dimensional and nonlinear test cases.
 arXiv  Detail & Related papers  (2022-03-09T04:16:31Z)
- Neural Estimation of Statistical Divergences [24.78742908726579]
 A modern method for estimating statistical divergences relies on parametrizing an empirical variational form by a neural network (NN)
In particular, there is a fundamental tradeoff between the two sources of error involved: approximation and empirical estimation.
We show that neural estimators with a slightly different NN growth-rate are near minimax rate-optimal, achieving the parametric convergence rate up to logarithmic factors.
 arXiv  Detail & Related papers  (2021-10-07T17:42:44Z)
- The Rate of Convergence of Variation-Constrained Deep Neural Networks [35.393855471751756]
 We show that a class of variation-constrained neural networks can achieve near-parametric rate $n-1/2+delta$ for an arbitrarily small constant $delta$.
The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived.
 arXiv  Detail & Related papers  (2021-06-22T21:28:00Z)
- Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
 Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
 arXiv  Detail & Related papers  (2021-06-06T19:08:53Z)
- Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
 Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
 arXiv  Detail & Related papers  (2021-06-06T18:05:02Z)
- Post-mortem on a deep learning contest: a Simpson's paradox and the
  complementary roles of scale metrics versus shape metrics [61.49826776409194]
 We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
 arXiv  Detail & Related papers  (2021-06-01T19:19:49Z)
- Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in
  High Dimensions [41.7567932118769]
 Empirical Risk Minimization algorithms are widely used in a variety of estimation and prediction tasks.
In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference.
 arXiv  Detail & Related papers  (2020-06-16T04:27:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.