Related papers: Differentiable Neural Networks with RePU Activation: with Applications to Score Estimation and Isotonic Regression

Differentiable Neural Networks with RePU Activation: with Applications to Score Estimation and Isotonic Regression

URL: http://arxiv.org/abs/2305.00608v3
Date: Sun, 21 Apr 2024 09:25:20 GMT
Title: Differentiable Neural Networks with RePU Activation: with Applications to Score Estimation and Isotonic Regression
Authors: Guohao Shen, Yuling Jiao, Yuanyuan Lin, Jian Huang,
Abstract summary: We study the properties of differentiable neural networks activated by rectified power unit (RePU) functions. We establish error bounds for simultaneously approximating $Cs$ smooth functions and their derivatives using RePU-activated deep neural networks.
Score: 7.450181695527364
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the properties of differentiable neural networks activated by rectified power unit (RePU) functions. We show that the partial derivatives of RePU neural networks can be represented by RePUs mixed-activated networks and derive upper bounds for the complexity of the function class of derivatives of RePUs networks. We establish error bounds for simultaneously approximating $C^s$ smooth functions and their derivatives using RePU-activated deep neural networks. Furthermore, we derive improved approximation error bounds when data has an approximate low-dimensional support, demonstrating the ability of RePU networks to mitigate the curse of dimensionality. To illustrate the usefulness of our results, we consider a deep score matching estimator (DSME) and propose a penalized deep isotonic regression (PDIR) using RePU networks. We establish non-asymptotic excess risk bounds for DSME and PDIR under the assumption that the target functions belong to a class of $C^s$ smooth functions. We also show that PDIR achieves the minimax optimal convergence rate and has a robustness property in the sense it is consistent with vanishing penalty parameters even when the monotonicity assumption is not satisfied. Furthermore, if the data distribution is supported on an approximate low-dimensional manifold, we show that DSME and PDIR can mitigate the curse of dimensionality.

Related papers

Flexible Automatic Identification and Removal (FAIR)-Pruner: An Efficient Neural Network Pruning Method [11.575879702610914]
This paper proposes the Flexible Automatic Identification and Removal (FAIR)-Pruner, a novel method for neural network structured pruning.<n>A major advantage of FAIR-Pruner lies in its capacity to automatically determine the layer-wise pruning rates, which yields a more efficient subnetwork structure.<n>With utilization scores and reconstruction errors, users can flexibly obtain pruned models under different pruning ratios.
arXiv Detail & Related papers (2025-08-04T10:59:07Z)
The Larger the Merrier? Efficient Large AI Model Inference in Wireless Edge Networks [56.37880529653111]
The demand for large computation model (LAIM) services is driving a paradigm shift from traditional cloud-based inference to edge-based inference for low-latency, privacy-preserving applications.<n>In this paper, we investigate the LAIM-inference scheme, where a pre-trained LAIM is pruned and partitioned into on-device and on-server sub-models for deployment.
arXiv Detail & Related papers (2025-05-14T08:18:55Z)
Deep learning with missing data [3.829599191332801]
We propose Pattern Embedded Neural Networks (PENNs), which can be applied in conjunction with any existing imputation technique. In addition to a neural network trained on the imputed data, PENNs pass the vectors of observation indicators through a second neural network to provide a compact representation. The outputs are then combined in a third neural network to produce final predictions.
arXiv Detail & Related papers (2025-04-21T18:57:36Z)
Approximation Error and Complexity Bounds for ReLU Networks on Low-Regular Function Spaces [0.0]
We consider the approximation of a large class of bounded functions, with minimal regularity assumptions, by ReLU neural networks. We show that the approximation error can be bounded from above by a quantity proportional to the uniform norm of the target function.
arXiv Detail & Related papers (2024-05-10T14:31:58Z)
A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks. We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks. Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z)
Adaptive Multilevel Neural Networks for Parametric PDEs with Error Estimation [0.0]
A neural network architecture is presented to solve high-dimensional parameter-dependent partial differential equations (pPDEs) It is constructed to map parameters of the model data to corresponding finite element solutions. It outputs a coarse grid solution and a series of corrections as produced in an adaptive finite element method (AFEM)
arXiv Detail & Related papers (2024-03-19T11:34:40Z)
Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs. We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z)
A Fair Loss Function for Network Pruning [70.35230425589592]
We introduce the performance weighted loss function, a simple modified cross-entropy loss function that can be used to limit the introduction of biases during pruning. Experiments using the CelebA, Fitzpatrick17k and CIFAR-10 datasets demonstrate that the proposed method is a simple and effective tool.
arXiv Detail & Related papers (2022-11-18T15:17:28Z)
Estimation of Non-Crossing Quantile Regression Process with Deep ReQU Neural Networks [5.5272015676880795]
We propose a penalized nonparametric approach to estimating the quantile regression process (QRP) in a nonseparable model using quadratic unit (ReQU) activated deep neural networks. We establish the non-asymptotic excess risk bounds for the estimated QRP and derive the mean integrated squared error for the estimated QRP under mild smoothness and regularity conditions.
arXiv Detail & Related papers (2022-07-21T12:26:45Z)
Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks [71.95722100511627]
We consider the off-policy evaluation problem of reinforcement learning using deep neural networks. We show that, by choosing network size appropriately, one can leverage the low-dimensional manifold structure in the Markov decision process.
arXiv Detail & Related papers (2022-06-06T20:25:20Z)
How do noise tails impact on deep ReLU networks? [2.5889847253961418]
We show how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions. We also contribute some new results on the approximation theory of deep ReLU neural networks.
arXiv Detail & Related papers (2022-03-20T00:27:32Z)
Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent. We show that SGD is biased towards a simple solution. We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z)
Probabilistic partition of unity networks: clustering based deep approximation [0.0]
Partition of unity networks (POU-Nets) have been shown capable of realizing algebraic convergence rates for regression and solution of PDEs. We enrich POU-Nets with a Gaussian noise model to obtain a probabilistic generalization amenable to gradient-based generalizations of a maximum likelihood loss. We provide benchmarks quantifying performance in high/low-dimensions, demonstrating that convergence rates depend only on the latent dimension of data within high-dimensional space.
arXiv Detail & Related papers (2021-07-07T08:02:00Z)
Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning [97.28695683236981]
More gradient updates decrease the expressivity of the current value network. We demonstrate this phenomenon on Atari and Gym benchmarks, in both offline and online RL settings.
arXiv Detail & Related papers (2020-10-27T17:55:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.