Mitigating Neural Network Overconfidence with Logit Normalization
- URL: http://arxiv.org/abs/2205.09310v1
- Date: Thu, 19 May 2022 03:45:18 GMT
- Title: Mitigating Neural Network Overconfidence with Logit Normalization
- Authors: Hongxin Wei, Renchunzi Xie, Hao Cheng, Lei Feng, Bo An, Yixuan Li
- Abstract summary: neural networks produce abnormally high confidence for both in- and out-of-distribution inputs.
We show that this issue can be mitigated through Logit Normalization (LogitNorm)
Our method is motivated by the analysis that the norm of the logit keeps increasing during training, leading to overconfident output.
- Score: 37.106755943446515
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Detecting out-of-distribution inputs is critical for safe deployment of
machine learning models in the real world. However, neural networks are known
to suffer from the overconfidence issue, where they produce abnormally high
confidence for both in- and out-of-distribution inputs. In this work, we show
that this issue can be mitigated through Logit Normalization (LogitNorm) -- a
simple fix to the cross-entropy loss -- by enforcing a constant vector norm on
the logits in training. Our method is motivated by the analysis that the norm
of the logit keeps increasing during training, leading to overconfident output.
Our key idea behind LogitNorm is thus to decouple the influence of output's
norm during network optimization. Trained with LogitNorm, neural networks
produce highly distinguishable confidence scores between in- and
out-of-distribution data. Extensive experiments demonstrate the superiority of
LogitNorm, reducing the average FPR95 by up to 42.30% on common benchmarks.
Related papers
- Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - Distribution Mismatch Correction for Improved Robustness in Deep Neural
Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions.
We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer.
In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z) - The Compact Support Neural Network [6.47243430672461]
We present a neuron generalization that has the standard dot-product-based neuron and the RBF neuron as two extreme cases of a shape parameter.
We show how to avoid difficulties in training a neural network with such neurons, by starting with a trained standard neural network and gradually increasing the shape parameter to the desired value.
arXiv Detail & Related papers (2021-04-01T06:08:09Z) - Performance Bounds for Neural Network Estimators: Applications in Fault
Detection [2.388501293246858]
We exploit recent results in quantifying the robustness of neural networks to construct and tune a model-based anomaly detector.
In tuning, we specifically provide upper bounds on the rate of false alarms expected under normal operation.
arXiv Detail & Related papers (2021-03-22T19:23:08Z) - Non-Singular Adversarial Robustness of Neural Networks [58.731070632586594]
Adrial robustness has become an emerging challenge for neural network owing to its over-sensitivity to small input perturbations.
We formalize the notion of non-singular adversarial robustness for neural networks through the lens of joint perturbations to data inputs as well as model weights.
arXiv Detail & Related papers (2021-02-23T20:59:30Z) - Input Hessian Regularization of Neural Networks [31.941188983286207]
We propose an efficient algorithm to train deep neural networks with Hessian operator-norm regularization.
We show that the new regularizer can, indeed, be feasible and, furthermore, that it increases the robustness of neural networks over input gradient regularization.
arXiv Detail & Related papers (2020-09-14T16:58:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.