Semi-Supervised Deep Sobolev Regression: Estimation, Variable Selection
and Beyond
- URL: http://arxiv.org/abs/2401.04535v1
- Date: Tue, 9 Jan 2024 13:10:30 GMT
- Title: Semi-Supervised Deep Sobolev Regression: Estimation, Variable Selection
and Beyond
- Authors: Zhao Ding and Chenguang Duan and Yuling Jiao and Jerry Zhijian Yang
- Abstract summary: We propose SDORE, a semi-supervised deep Sobolev regressor, for the nonparametric estimation of the underlying regression function and its gradient.
We conduct a comprehensive analysis of the convergence rates of SDORE and establish a minimax optimal rate for the regression function.
We also derive a convergence rate for the associated plug-in gradient estimator, even in the presence of significant domain shift.
- Score: 3.782392436834913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose SDORE, a semi-supervised deep Sobolev regressor, for the
nonparametric estimation of the underlying regression function and its
gradient. SDORE employs deep neural networks to minimize empirical risk with
gradient norm regularization, allowing computation of the gradient norm on
unlabeled data. We conduct a comprehensive analysis of the convergence rates of
SDORE and establish a minimax optimal rate for the regression function.
Crucially, we also derive a convergence rate for the associated plug-in
gradient estimator, even in the presence of significant domain shift. These
theoretical findings offer valuable prior guidance for selecting regularization
parameters and determining the size of the neural network, while showcasing the
provable advantage of leveraging unlabeled data in semi-supervised learning. To
the best of our knowledge, SDORE is the first provable neural network-based
approach that simultaneously estimates the regression function and its
gradient, with diverse applications including nonparametric variable selection
and inverse problems. The effectiveness of SDORE is validated through an
extensive range of numerical simulations and real data analysis.
Related papers
- Deep learning with missing data [3.829599191332801]
We propose Pattern Embedded Neural Networks (PENNs), which can be applied in conjunction with any existing imputation technique.
In addition to a neural network trained on the imputed data, PENNs pass the vectors of observation indicators through a second neural network to provide a compact representation.
The outputs are then combined in a third neural network to produce final predictions.
arXiv Detail & Related papers (2025-04-21T18:57:36Z) - Statistically guided deep learning [10.619901778151336]
We present a theoretically well-founded deep learning algorithm for nonparametric regression.
We show that a theoretical analysis of deep learning which takes into account simultaneously optimization, generalization and approximation can result in a new deep learning estimate.
arXiv Detail & Related papers (2025-04-11T12:36:06Z) - Deep Fréchet Regression [4.915744683251151]
We propose a flexible regression model capable of handling high-dimensional predictors without imposing parametric assumptions.
The proposed model outperforms existing methods for non-Euclidean responses.
arXiv Detail & Related papers (2024-07-31T07:54:14Z) - Sparse deep neural networks for nonparametric estimation in high-dimensional sparse regression [4.983567824636051]
This study combines nonparametric estimation and parametric sparse deep neural networks for the first time.
As nonparametric estimation of partial derivatives is of great significance for nonlinear variable selection, the current results show the promising future for the interpretability of deep neural networks.
arXiv Detail & Related papers (2024-06-26T07:41:41Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - Semiparametric Regression for Spatial Data via Deep Learning [17.63607438860882]
We use a sparsely connected deep neural network with rectified linear unit (ReLU) activation function to estimate the unknown regression function.
Our method can handle well large data set owing to the gradient descent optimization algorithm.
arXiv Detail & Related papers (2023-01-10T01:55:55Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Sobolev Acceleration and Statistical Optimality for Learning Elliptic
Equations via Gradient Descent [11.483919798541393]
We study the statistical limits in terms of Sobolev norms of gradient descent for solving inverse problem from randomly sampled noisy observations.
Our class of objective functions includes Sobolev training for kernel regression, Deep Ritz Methods (DRM), and Physics Informed Neural Networks (PINN)
arXiv Detail & Related papers (2022-05-15T17:01:53Z) - Estimation of the Mean Function of Functional Data via Deep Neural
Networks [6.230751621285321]
We propose a deep neural network method to perform nonparametric regression for functional data.
The proposed method is applied to analyze positron emission tomography images of patients with Alzheimer disease.
arXiv Detail & Related papers (2020-12-08T17:18:16Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z) - Learning Rates as a Function of Batch Size: A Random Matrix Theory
Approach to Neural Network Training [2.9649783577150837]
We study the effect of mini-batching on the loss landscape of deep neural networks using spiked, field-dependent random matrix theory.
We derive analytical expressions for the maximal descent and adaptive training regimens for smooth, non-Newton deep neural networks.
We validate our claims on the VGG/ResNet and ImageNet datasets.
arXiv Detail & Related papers (2020-06-16T11:55:45Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.