Quantile regression with deep ReLU Networks: Estimators and minimax
rates
- URL: http://arxiv.org/abs/2010.08236v5
- Date: Fri, 18 Dec 2020 02:40:16 GMT
- Title: Quantile regression with deep ReLU Networks: Estimators and minimax
rates
- Authors: Oscar Hernan Madrid Padilla, Wesley Tansey, Yanzhen Chen
- Abstract summary: We study quantile regression with rectified linear unit (ReLU) neural networks.
We derive an upper bound on the expected mean squared error of a ReLU network.
These tight bounds imply ReLU networks with quantile regression achieve minimax rates for broad collections of function types.
- Score: 4.522666263036413
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantile regression is the task of estimating a specified percentile
response, such as the median, from a collection of known covariates. We study
quantile regression with rectified linear unit (ReLU) neural networks as the
chosen model class. We derive an upper bound on the expected mean squared error
of a ReLU network used to estimate any quantile conditional on a set of
covariates. This upper bound only depends on the best possible approximation
error, the number of layers in the network, and the number of nodes per layer.
We further show upper bounds that are tight for two large classes of functions:
compositions of H\"older functions and members of a Besov space. These tight
bounds imply ReLU networks with quantile regression achieve minimax rates for
broad collections of function types. Unlike existing work, the theoretical
results hold under minimal assumptions and apply to general error
distributions, including heavy-tailed distributions. Empirical simulations on a
suite of synthetic response functions demonstrate the theoretical results
translate to practical implementations of ReLU networks. Overall, the
theoretical and empirical results provide insight into the strong performance
of ReLU neural networks for quantile regression across a broad range of
function classes and error distributions. All code for this paper is publicly
available at https://github.com/tansey/quantile-regression.
Related papers
- Benign Overfitting for Regression with Trained Two-Layer ReLU Networks [14.36840959836957]
We study the least-square regression problem with a two-layer fully-connected neural network, with ReLU activation function, trained by gradient flow.
Our first result is a generalization result, that requires no assumptions on the underlying regression function or the noise other than that they are bounded.
arXiv Detail & Related papers (2024-10-08T16:54:23Z) - Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax
Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes [7.433327915285969]
We prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss.
We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence.
arXiv Detail & Related papers (2024-01-08T23:54:46Z) - The Implicit Bias of Minima Stability in Multivariate Shallow ReLU
Networks [53.95175206863992]
We study the type of solutions to which gradient descent converges when used to train a single hidden-layer multivariate ReLU network with the quadratic loss.
We prove that although shallow ReLU networks are universal approximators, stable shallow networks are not.
arXiv Detail & Related papers (2023-06-30T09:17:39Z) - Bagged Polynomial Regression and Neural Networks [0.0]
Series and dataset regression are able to approximate the same function classes as neural networks.
textitbagged regression (BPR) is an attractive alternative to neural networks.
BPR performs as well as neural networks in crop classification using satellite data.
arXiv Detail & Related papers (2022-05-17T19:55:56Z) - Learning Quantile Functions without Quantile Crossing for
Distribution-free Time Series Forecasting [12.269597033369557]
We propose the Incremental (Spline) Quantile Functions I(S)QF, a flexible and efficient distribution-free quantile estimation framework.
We also provide a generalization error analysis of our proposed approaches under the sequence-to-sequence setting.
arXiv Detail & Related papers (2021-11-12T06:54:48Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Predicting Unreliable Predictions by Shattering a Neural Network [145.3823991041987]
Piecewise linear neural networks can be split into subfunctions.
Subfunctions have their own activation pattern, domain, and empirical error.
Empirical error for the full network can be written as an expectation over subfunctions.
arXiv Detail & Related papers (2021-06-15T18:34:41Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Projection Neural Network for a Class of Sparse Regression Problems with
Cardinality Penalty [9.698438188398434]
We consider a class of sparse regression problems, whose objective function is the summation of a convex loss function and a cardinality penalty.
By constructing a smoothing function for the cardinality function, we propose a projected neural network and design a correction method for solving this problem.
The solution of the proposed neural network is unique, global existent, bounded and globally Lipschitz continuous.
arXiv Detail & Related papers (2020-04-02T08:05:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.