DL101 Neural Network Outputs and Loss Functions
- URL: http://arxiv.org/abs/2511.05131v1
- Date: Fri, 07 Nov 2025 10:20:45 GMT
- Title: DL101 Neural Network Outputs and Loss Functions
- Authors: Fernando Berzal,
- Abstract summary: Loss function used to train a neural network is strongly connected to its output layer from a statistical point of view.<n>Report analyzes common activation functions for a neural network output layer, like linear, sigmoid, ReLU, and softmax.
- Score: 51.77969450792284
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The loss function used to train a neural network is strongly connected to its output layer from a statistical point of view. This technical report analyzes common activation functions for a neural network output layer, like linear, sigmoid, ReLU, and softmax, detailing their mathematical properties and their appropriate use cases. A strong statistical justification exists for the selection of the suitable loss function for training a deep learning model. This report connects common loss functions such as Mean Squared Error (MSE), Mean Absolute Error (MAE), and various Cross-Entropy losses to the statistical principle of Maximum Likelihood Estimation (MLE). Choosing a specific loss function is equivalent to assuming a specific probability distribution for the model output, highlighting the link between these functions and the Generalized Linear Models (GLMs) that underlie network output layers. Additional scenarios of practical interest are also considered, such as alternative output encodings, constrained outputs, and distributions with heavy tails.
Related papers
- Uncertainty propagation in feed-forward neural network models [3.987067170467799]
We develop new uncertainty propagation methods for feed-forward neural network architectures.<n>We derive analytical expressions for the probability density function (PDF) of the neural network output.<n>A key finding is that an appropriate linearization of the leaky ReLU activation function yields accurate statistical results.
arXiv Detail & Related papers (2025-03-27T00:16:36Z) - A Versatile Influence Function for Data Attribution with Non-Decomposable Loss [3.1615846013409925]
We propose a Versatile Influence Function (VIF) that can be straightforwardly applied to machine learning models trained with any non-decomposable loss.<n>VIF represents a significant advancement in data attribution, enabling efficient influence-function-based attribution across a wide range of machine learning paradigms.
arXiv Detail & Related papers (2024-12-02T09:59:01Z) - Accelerated Neural Network Training with Rooted Logistic Objectives [13.400503928962756]
We derive a novel sequence of em strictly convex functions that are at least as strict as logistic loss.
Our results illustrate that training with rooted loss function is converged faster and gains performance improvements.
arXiv Detail & Related papers (2023-10-05T20:49:48Z) - Modeling Uncertain Feature Representation for Domain Generalization [49.129544670700525]
We show that our method consistently improves the network generalization ability on multiple vision tasks.
Our methods are simple yet effective and can be readily integrated into networks without additional trainable parameters or loss constraints.
arXiv Detail & Related papers (2023-01-16T14:25:02Z) - On the Forward Invariance of Neural ODEs [92.07281135902922]
We propose a new method to ensure neural ordinary differential equations (ODEs) satisfy output specifications.
Our approach uses a class of control barrier functions to transform output specifications into constraints on the parameters and inputs of the learning system.
arXiv Detail & Related papers (2022-10-10T15:18:28Z) - Uncertainty Modeling for Out-of-Distribution Generalization [56.957731893992495]
We argue that the feature statistics can be properly manipulated to improve the generalization ability of deep learning models.
Common methods often consider the feature statistics as deterministic values measured from the learned features.
We improve the network generalization ability by modeling the uncertainty of domain shifts with synthesized feature statistics during training.
arXiv Detail & Related papers (2022-02-08T16:09:12Z) - Concise Logarithmic Loss Function for Robust Training of Anomaly
Detection Model [0.0]
To train the artificial neural network more stable, it should be better to define the appropriate neural network structure or the loss function.
For the training anomaly detection model, the mean squared error (MSE) function is adopted widely.
The novel loss function, logarithmic mean squared error (LMSE), is proposed in this paper to train the neural network more stable.
arXiv Detail & Related papers (2022-01-15T03:22:15Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - Shaping Deep Feature Space towards Gaussian Mixture for Visual
Classification [74.48695037007306]
We propose a Gaussian mixture (GM) loss function for deep neural networks for visual classification.
With a classification margin and a likelihood regularization, the GM loss facilitates both high classification performance and accurate modeling of the feature distribution.
The proposed model can be implemented easily and efficiently without using extra trainable parameters.
arXiv Detail & Related papers (2020-11-18T03:32:27Z) - Empirical Strategy for Stretching Probability Distribution in
Neural-network-based Regression [5.35308390309106]
In regression analysis under artificial neural networks, the prediction performance depends on determining the appropriate weights between layers.
We proposed weighted empirical stretching (WES) as a novel loss function to increase the overlap area of the two distributions.
The improved results in RMSE for the extreme domain are expected to be utilized for prediction of abnormal events in non-linear complex systems.
arXiv Detail & Related papers (2020-09-08T06:08:14Z) - Estimating Structural Target Functions using Machine Learning and
Influence Functions [103.47897241856603]
We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models.
This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics.
We put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information.
arXiv Detail & Related papers (2020-08-14T16:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.