Semi-Supervised Deep Sobolev Regression: Estimation, Variable Selection
and Beyond
- URL: http://arxiv.org/abs/2401.04535v1
- Date: Tue, 9 Jan 2024 13:10:30 GMT
- Title: Semi-Supervised Deep Sobolev Regression: Estimation, Variable Selection
and Beyond
- Authors: Zhao Ding and Chenguang Duan and Yuling Jiao and Jerry Zhijian Yang
- Abstract summary: We propose SDORE, a semi-supervised deep Sobolev regressor, for the nonparametric estimation of the underlying regression function and its gradient.
We conduct a comprehensive analysis of the convergence rates of SDORE and establish a minimax optimal rate for the regression function.
We also derive a convergence rate for the associated plug-in gradient estimator, even in the presence of significant domain shift.
- Score: 3.782392436834913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose SDORE, a semi-supervised deep Sobolev regressor, for the
nonparametric estimation of the underlying regression function and its
gradient. SDORE employs deep neural networks to minimize empirical risk with
gradient norm regularization, allowing computation of the gradient norm on
unlabeled data. We conduct a comprehensive analysis of the convergence rates of
SDORE and establish a minimax optimal rate for the regression function.
Crucially, we also derive a convergence rate for the associated plug-in
gradient estimator, even in the presence of significant domain shift. These
theoretical findings offer valuable prior guidance for selecting regularization
parameters and determining the size of the neural network, while showcasing the
provable advantage of leveraging unlabeled data in semi-supervised learning. To
the best of our knowledge, SDORE is the first provable neural network-based
approach that simultaneously estimates the regression function and its
gradient, with diverse applications including nonparametric variable selection
and inverse problems. The effectiveness of SDORE is validated through an
extensive range of numerical simulations and real data analysis.
Related papers
- Deep Fréchet Regression [4.915744683251151]
We propose a flexible regression model capable of handling high-dimensional predictors without imposing parametric assumptions.
The proposed model outperforms existing methods for non-Euclidean responses.
arXiv Detail & Related papers (2024-07-31T07:54:14Z) - Sparse deep neural networks for nonparametric estimation in high-dimensional sparse regression [4.983567824636051]
This study combines nonparametric estimation and parametric sparse deep neural networks for the first time.
As nonparametric estimation of partial derivatives is of great significance for nonlinear variable selection, the current results show the promising future for the interpretability of deep neural networks.
arXiv Detail & Related papers (2024-06-26T07:41:41Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels [78.6096486885658]
We introduce lower bounds to the linearized Laplace approximation of the marginal likelihood.
These bounds are amenable togradient-based optimization and allow to trade off estimation accuracy against computational complexity.
arXiv Detail & Related papers (2023-06-06T19:02:57Z) - Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - Semiparametric Regression for Spatial Data via Deep Learning [17.63607438860882]
We use a sparsely connected deep neural network with rectified linear unit (ReLU) activation function to estimate the unknown regression function.
Our method can handle well large data set owing to the gradient descent optimization algorithm.
arXiv Detail & Related papers (2023-01-10T01:55:55Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Sobolev Acceleration and Statistical Optimality for Learning Elliptic
Equations via Gradient Descent [11.483919798541393]
We study the statistical limits in terms of Sobolev norms of gradient descent for solving inverse problem from randomly sampled noisy observations.
Our class of objective functions includes Sobolev training for kernel regression, Deep Ritz Methods (DRM), and Physics Informed Neural Networks (PINN)
arXiv Detail & Related papers (2022-05-15T17:01:53Z) - Estimation of the Mean Function of Functional Data via Deep Neural
Networks [6.230751621285321]
We propose a deep neural network method to perform nonparametric regression for functional data.
The proposed method is applied to analyze positron emission tomography images of patients with Alzheimer disease.
arXiv Detail & Related papers (2020-12-08T17:18:16Z) - Optimal Rates for Averaged Stochastic Gradient Descent under Neural
Tangent Kernel Regime [50.510421854168065]
We show that the averaged gradient descent can achieve the minimax optimal convergence rate.
We show that the target function specified by the NTK of a ReLU network can be learned at the optimal convergence rate.
arXiv Detail & Related papers (2020-06-22T14:31:37Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.