Provably robust learning of regression neural networks using $β$-divergences
- URL: http://arxiv.org/abs/2602.08933v1
- Date: Mon, 09 Feb 2026 17:32:43 GMT
- Title: Provably robust learning of regression neural networks using $β$-divergences
- Authors: Abhik Ghosh, Suryasis Jana,
- Abstract summary: Existing robust training methods for regression neural networks (NNs) are often limited in scope and rely primarily on empirical validation.<n>We propose a new robust learning framework for regression NNs based on the $$-divergence (also known as the density power divergence) which we call rRNet'<n>We prove that rRNet attains the optimal 50% breakdown point at the assumed model for all $in(0, 1]$, providing a strong global robustness guarantee.
- Score: 1.2891210250935143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Regression neural networks (NNs) are most commonly trained by minimizing the mean squared prediction error, which is highly sensitive to outliers and data contamination. Existing robust training methods for regression NNs are often limited in scope and rely primarily on empirical validation, with only a few offering partial theoretical guarantees. In this paper, we propose a new robust learning framework for regression NNs based on the $β$-divergence (also known as the density power divergence) which we call `rRNet'. It applies to a broad class of regression NNs, including models with non-smooth activation functions and error densities, and recovers the classical maximum likelihood learning as a special case. The rRNet is implemented via an alternating optimization scheme, for which we establish convergence guarantees to stationary points under mild, verifiable conditions. The (local) robustness of rRNet is theoretically characterized through the influence functions of both the parameter estimates and the resulting rRNet predictor, which are shown to be bounded for suitable choices of the tuning parameter $β$, depending on the error density. We further prove that rRNet attains the optimal 50\% asymptotic breakdown point at the assumed model for all $β\in(0, 1]$, providing a strong global robustness guarantee that is largely absent for existing NN learning methods. Our theoretical results are complemented by simulation experiments and real-data analyses, illustrating practical advantages of rRNet over existing approaches in both function approximation problems and prediction tasks with noisy observations.
Related papers
- Average-Over-Time Spiking Neural Networks for Uncertainty Estimation in Regression [3.409728296852651]
We introduce two methods that adapt the Average-Over-Time Spiking Neural Network (AOT-SNN) framework to regression tasks.<n>We evaluate our approaches on both a toy dataset and several benchmark datasets.
arXiv Detail & Related papers (2024-11-29T23:13:52Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Estimation of Non-Crossing Quantile Regression Process with Deep ReQU
Neural Networks [5.5272015676880795]
We propose a penalized nonparametric approach to estimating the quantile regression process (QRP) in a nonseparable model using quadratic unit (ReQU) activated deep neural networks.
We establish the non-asymptotic excess risk bounds for the estimated QRP and derive the mean integrated squared error for the estimated QRP under mild smoothness and regularity conditions.
arXiv Detail & Related papers (2022-07-21T12:26:45Z) - Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient
for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research.
We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.
Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z) - A Kernel-Expanded Stochastic Neural Network [10.837308632004644]
Deep neural network often gets trapped into a local minimum in training.
New kernel-expanded neural network (K-StoNet) model reformulates the network as a latent variable model.
Model can be easily trained using the imputationregularized optimization (IRO) algorithm.
arXiv Detail & Related papers (2022-01-14T06:42:42Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Measurement error models: from nonparametric methods to deep neural
networks [3.1798318618973362]
We propose an efficient neural network design for estimating measurement error models.
We use a fully connected feed-forward neural network to approximate the regression function $f(x)$.
We conduct an extensive numerical study to compare the neural network approach with classical nonparametric methods.
arXiv Detail & Related papers (2020-07-15T06:05:37Z) - Nonconvex sparse regularization for deep neural networks and its
optimality [1.9798034349981162]
Deep neural network (DNN) estimators can attain optimal convergence rates for regression and classification problems.
We propose a novel penalized estimation method for sparse DNNs.
We prove that the sparse-penalized estimator can adaptively attain minimax convergence rates for various nonparametric regression problems.
arXiv Detail & Related papers (2020-03-26T07:15:28Z) - Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks [65.24701908364383]
We show that a sufficient condition for a uncertainty on a ReLU network is "to be a bit Bayesian calibrated"
We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.
arXiv Detail & Related papers (2020-02-24T08:52:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.