Robust Deep Network Learning of Nonlinear Regression Tasks by Parametric Leaky Exponential Linear Units (LELUs) and a Diffusion Metric
- URL: http://arxiv.org/abs/2507.06765v1
- Date: Wed, 09 Jul 2025 11:49:15 GMT
- Title: Robust Deep Network Learning of Nonlinear Regression Tasks by Parametric Leaky Exponential Linear Units (LELUs) and a Diffusion Metric
- Authors: Enda D. V. Bigarella,
- Abstract summary: nonlinear ac.f.'s are required for learning nonlinear datasets.<n>Smooth but vanishing-gradient ac.f.'s such as ELU or SiLU have limited performance.<n>Non-smooth ac.f.'s such as RELU and Leaky-RELU impart discontinuity in the trained model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This document proposes a parametric activation function (ac.f.) aimed at improving multidimensional nonlinear data regression. It is a established knowledge that nonlinear ac.f.'s are required for learning nonlinear datasets. This work shows that smoothness and gradient properties of the ac.f. further impact the performance of large neural networks in terms of overfitting and sensitivity to model parameters. Smooth but vanishing-gradient ac.f.'s such as ELU or SiLU have limited performance and non-smooth ac.f.'s such as RELU and Leaky-RELU further impart discontinuity in the trained model. Improved performance is demonstrated with a smooth "Leaky Exponential Linear Unit", with non-zero gradient that can be trained. A novel diffusion-loss metric is also proposed to gauge the performance of the trained models in terms of overfitting.
Related papers
- Pessimistic Nonlinear Least-Squares Value Iteration for Offline Reinforcement Learning [53.97335841137496]
We propose an oracle-efficient algorithm, dubbed Pessimistic Least-Square Value Iteration (PNLSVI) for offline RL with non-linear function approximation.
Our algorithm enjoys a regret bound that has a tight dependency on the function class complexity and achieves minimax optimal instance-dependent regret when specialized to linear function approximation.
arXiv Detail & Related papers (2023-10-02T17:42:01Z) - Optimal Nonlinearities Improve Generalization Performance of Random
Features [0.9790236766474201]
Random feature model with a nonlinear activation function has been shown to performally equivalent to a Gaussian model in terms of training and generalization errors.
We show that acquired parameters from the Gaussian model enable us to define a set of optimal nonlinearities.
Our numerical results validate that the optimized nonlinearities achieve better generalization performance than widely-used nonlinear functions such as ReLU.
arXiv Detail & Related papers (2023-09-28T20:55:21Z) - Data-driven Nonlinear Parametric Model Order Reduction Framework using
Deep Hierarchical Variational Autoencoder [5.521324490427243]
Data-driven parametric model order reduction (MOR) method using a deep artificial neural network is proposed.
LSH-VAE is capable of performing nonlinear MOR for the parametric of a nonlinear dynamic system with a significant number of degrees of freedom.
arXiv Detail & Related papers (2023-07-10T02:44:53Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Linear Stability Hypothesis and Rank Stratification for Nonlinear Models [3.0041514772139166]
We propose a rank stratification for general nonlinear models to uncover a model rank as an "effective size of parameters"
By these results, model rank of a target function predicts a minimal training data size for its successful recovery.
arXiv Detail & Related papers (2022-11-21T16:27:25Z) - Near-optimal Offline Reinforcement Learning with Linear Representation:
Leveraging Variance Information with Pessimism [65.46524775457928]
offline reinforcement learning seeks to utilize offline/historical data to optimize sequential decision-making strategies.
We study the statistical limits of offline reinforcement learning with linear model representations.
arXiv Detail & Related papers (2022-03-11T09:00:12Z) - Non-linear manifold ROM with Convolutional Autoencoders and Reduced
Over-Collocation method [0.0]
Non-affine parametric dependencies, nonlinearities and advection-dominated regimes of the model of interest can result in a slow Kolmogorov n-width decay.
We implement the non-linear manifold method introduced by Carlberg et al [37] with hyper-reduction achieved through reduced over-collocation and teacher-student training of a reduced decoder.
We test the methodology on a 2d non-linear conservation law and a 2d shallow water models, and compare the results obtained with a purely data-driven method for which the dynamics is evolved in time with a long-short term memory network
arXiv Detail & Related papers (2022-03-01T11:16:50Z) - LQF: Linear Quadratic Fine-Tuning [114.3840147070712]
We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning.
LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
arXiv Detail & Related papers (2020-12-21T06:40:20Z) - Optimization-driven Machine Learning for Intelligent Reflecting Surfaces
Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts.
Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity.
In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.