Constrained Optimization for Training Deep Neural Networks Under Class
Imbalance
- URL: http://arxiv.org/abs/2102.12894v1
- Date: Sun, 21 Feb 2021 09:49:36 GMT
- Title: Constrained Optimization for Training Deep Neural Networks Under Class
Imbalance
- Authors: Sara Sangalli, Ertunc Erdil, Andreas Hoetker, Olivio Donati, Ender
Konukoglu
- Abstract summary: We introduce a novel constraint that can be used with existing loss functions to enforce maximal area under the ROC curve.
We present experimental results for image-based classification applications using the CIFAR10 and an in-house medical imaging dataset.
- Score: 9.557146081524008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) are notorious for making more mistakes for the
classes that have substantially fewer samples than the others during training.
Such class imbalance is ubiquitous in clinical applications and very crucial to
handle because the classes with fewer samples most often correspond to critical
cases (e.g., cancer) where misclassifications can have severe consequences. Not
to miss such cases, binary classifiers need to be operated at high True
Positive Rates (TPR) by setting a higher threshold but this comes at the cost
of very high False Positive Rates (FPR) for problems with class imbalance.
Existing methods for learning under class imbalance most often do not take this
into account. We argue that prediction accuracy should be improved by
emphasizing reducing FPRs at high TPRs for problems where misclassification of
the positive samples are associated with higher cost. To this end, we pose the
training of a DNN for binary classification as a constrained optimization
problem and introduce a novel constraint that can be used with existing loss
functions to enforce maximal area under the ROC curve (AUC). We solve the
resulting constrained optimization problem using an Augmented Lagrangian method
(ALM), where the constraint emphasizes reduction of FPR at high TPR. We present
experimental results for image-based classification applications using the
CIFAR10 and an in-house medical imaging dataset. Our results demonstrate that
the proposed method almost always improves the loss functions it is used with
by attaining lower FPR at high TPR and higher or equal AUC.
Related papers
- Minimizing Chebyshev Prototype Risk Magically Mitigates the Perils of Overfitting [1.6574413179773757]
We develop multicomponent loss functions that reduce intra-class feature correlation and maximize inter-class feature distance.
We implement the terms of the Chebyshev Prototype Risk (CPR) bound into our Explicit CPR loss function.
Our training algorithm reduces overfitting and improves upon previous approaches in many settings.
arXiv Detail & Related papers (2024-04-10T15:16:04Z) - Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax
Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes [7.433327915285969]
We prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss.
We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence.
arXiv Detail & Related papers (2024-01-08T23:54:46Z) - Is Last Layer Re-Training Truly Sufficient for Robustness to Spurious
Correlations? [2.7985765111086254]
Models trained with empirical risk minimization (ERM) learn to rely on spurious features, i.e., their prediction is based on undesired auxiliary features.
The recently proposed Deep Feature Reweighting (DFR) method improves accuracy of these worst groups.
In this work, we examine the applicability of DFR to realistic data in the medical domain.
arXiv Detail & Related papers (2023-08-01T11:54:34Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Test-Time Amendment with a Coarse Classifier for Fine-Grained
Classification [10.719054378755981]
We present a novel approach for Post-Hoc Correction called Hierarchical Ensembles (HiE)
HiE utilizes label hierarchy to improve the performance of fine-grained classification at test-time using the coarse-grained predictions.
Our approach brings notable gains in top-1 accuracy while significantly decreasing the severity of mistakes as training data decreases for the fine-grained classes.
arXiv Detail & Related papers (2023-02-01T10:55:27Z) - Optimizing Two-way Partial AUC with an End-to-end Framework [154.47590401735323]
Area Under the ROC Curve (AUC) is a crucial metric for machine learning.
Recent work shows that the TPAUC is essentially inconsistent with the existing Partial AUC metrics.
We present the first trial in this paper to optimize this new metric.
arXiv Detail & Related papers (2022-06-23T12:21:30Z) - Large-scale Optimization of Partial AUC in a Range of False Positive
Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning.
We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique.
Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z) - Learning with Multiclass AUC: Theory and Algorithms [141.63211412386283]
Area under the ROC curve (AUC) is a well-known ranking metric for problems such as imbalanced learning and recommender systems.
In this paper, we start an early trial to consider the problem of learning multiclass scoring functions via optimizing multiclass AUC metrics.
arXiv Detail & Related papers (2021-07-28T05:18:10Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z) - Adaptive Low-Rank Factorization to regularize shallow and deep neural
networks [9.607123078804959]
We use Low-Rank matrix Factorization (LRF) to drop out some parameters of the learning model along the training process.
The best results of AdaptiveLRF on SVHN and CIFAR-10 datasets are 98%, 94.1% F-measure, and 97.9%, 94% accuracy.
arXiv Detail & Related papers (2020-05-05T08:13:30Z) - TRP: Trained Rank Pruning for Efficient Deep Neural Networks [69.06699632822514]
We propose Trained Rank Pruning (TRP), which alternates between low rank approximation and training.
A nuclear regularization optimized by sub-gradient descent is utilized to further promote low rank in TRP.
The TRP trained network inherently has a low-rank structure, and is approximated with negligible performance loss.
arXiv Detail & Related papers (2020-04-30T03:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.