Related papers: Constrained Optimization for Training Deep Neural Networks Under Class Imbalance

Constrained Optimization for Training Deep Neural Networks Under Class Imbalance

URL: http://arxiv.org/abs/2102.12894v1
Date: Sun, 21 Feb 2021 09:49:36 GMT
Title: Constrained Optimization for Training Deep Neural Networks Under Class Imbalance
Authors: Sara Sangalli, Ertunc Erdil, Andreas Hoetker, Olivio Donati, Ender Konukoglu
Abstract summary: We introduce a novel constraint that can be used with existing loss functions to enforce maximal area under the ROC curve. We present experimental results for image-based classification applications using the CIFAR10 and an in-house medical imaging dataset.
Score: 9.557146081524008
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks (DNNs) are notorious for making more mistakes for the classes that have substantially fewer samples than the others during training. Such class imbalance is ubiquitous in clinical applications and very crucial to handle because the classes with fewer samples most often correspond to critical cases (e.g., cancer) where misclassifications can have severe consequences. Not to miss such cases, binary classifiers need to be operated at high True Positive Rates (TPR) by setting a higher threshold but this comes at the cost of very high False Positive Rates (FPR) for problems with class imbalance. Existing methods for learning under class imbalance most often do not take this into account. We argue that prediction accuracy should be improved by emphasizing reducing FPRs at high TPRs for problems where misclassification of the positive samples are associated with higher cost. To this end, we pose the training of a DNN for binary classification as a constrained optimization problem and introduce a novel constraint that can be used with existing loss functions to enforce maximal area under the ROC curve (AUC). We solve the resulting constrained optimization problem using an Augmented Lagrangian method (ALM), where the constraint emphasizes reduction of FPR at high TPR. We present experimental results for image-based classification applications using the CIFAR10 and an in-house medical imaging dataset. Our results demonstrate that the proposed method almost always improves the loss functions it is used with by attaining lower FPR at high TPR and higher or equal AUC.

Related papers

Minimizing Chebyshev Prototype Risk Magically Mitigates the Perils of Overfitting [1.6574413179773757]
We develop multicomponent loss functions that reduce intra-class feature correlation and maximize inter-class feature distance. We implement the terms of the Chebyshev Prototype Risk (CPR) bound into our Explicit CPR loss function. Our training algorithm reduces overfitting and improves upon previous approaches in many settings.
arXiv Detail & Related papers (2024-04-10T15:16:04Z)
Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes [7.433327915285969]
We prove the universal consistency of wide and deep ReLU neural network classifiers trained on the logistic loss. We also give sufficient conditions for a class of probability measures for which classifiers based on neural networks achieve minimax optimal rates of convergence.
arXiv Detail & Related papers (2024-01-08T23:54:46Z)
Is Last Layer Re-Training Truly Sufficient for Robustness to Spurious Correlations? [2.7985765111086254]
Models trained with empirical risk minimization (ERM) learn to rely on spurious features, i.e., their prediction is based on undesired auxiliary features. The recently proposed Deep Feature Reweighting (DFR) method improves accuracy of these worst groups. In this work, we examine the applicability of DFR to realistic data in the medical domain.
arXiv Detail & Related papers (2023-08-01T11:54:34Z)
Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification. Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z)
Test-Time Amendment with a Coarse Classifier for Fine-Grained Classification [10.719054378755981]
We present a novel approach for Post-Hoc Correction called Hierarchical Ensembles (HiE) HiE utilizes label hierarchy to improve the performance of fine-grained classification at test-time using the coarse-grained predictions. Our approach brings notable gains in top-1 accuracy while significantly decreasing the severity of mistakes as training data decreases for the fine-grained classes.
arXiv Detail & Related papers (2023-02-01T10:55:27Z)
Optimizing Two-way Partial AUC with an End-to-end Framework [154.47590401735323]
Area Under the ROC Curve (AUC) is a crucial metric for machine learning. Recent work shows that the TPAUC is essentially inconsistent with the existing Partial AUC metrics. We present the first trial in this paper to optimize this new metric.
arXiv Detail & Related papers (2022-06-23T12:21:30Z)
Large-scale Optimization of Partial AUC in a Range of False Positive Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning. We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique. Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z)
Learning with Multiclass AUC: Theory and Algorithms [141.63211412386283]
Area under the ROC curve (AUC) is a well-known ranking metric for problems such as imbalanced learning and recommender systems. In this paper, we start an early trial to consider the problem of learning multiclass scoring functions via optimizing multiclass AUC metrics.
arXiv Detail & Related papers (2021-07-28T05:18:10Z)
Unbiased Risk Estimators Can Mislead: A Case Study of Learning with Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels. We show that the quality of gradient estimation matters more in risk minimization. We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z)
Adaptive Low-Rank Factorization to regularize shallow and deep neural networks [9.607123078804959]
We use Low-Rank matrix Factorization (LRF) to drop out some parameters of the learning model along the training process. The best results of AdaptiveLRF on SVHN and CIFAR-10 datasets are 98%, 94.1% F-measure, and 97.9%, 94% accuracy.
arXiv Detail & Related papers (2020-05-05T08:13:30Z)
TRP: Trained Rank Pruning for Efficient Deep Neural Networks [69.06699632822514]
We propose Trained Rank Pruning (TRP), which alternates between low rank approximation and training. A nuclear regularization optimized by sub-gradient descent is utilized to further promote low rank in TRP. The TRP trained network inherently has a low-rank structure, and is approximated with negligible performance loss.
arXiv Detail & Related papers (2020-04-30T03:37:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.