Bridging the Gap: Unifying the Training and Evaluation of Neural Network
Binary Classifiers
- URL: http://arxiv.org/abs/2009.01367v3
- Date: Thu, 2 Jun 2022 00:22:00 GMT
- Title: Bridging the Gap: Unifying the Training and Evaluation of Neural Network
Binary Classifiers
- Authors: Nathan Tsoi, Kate Candon, Deyuan Li, Yofti Milkessa, Marynel V\'azquez
- Abstract summary: We propose a unifying approach to training neural network binary classifiers that combines a differentiable approximation of the Heaviside function with a probabilistic view of the typical confusion matrix values using soft sets.
Our theoretical analysis shows the benefit of using our method to optimize for a given evaluation metric, such as $F_$-Score, with soft sets.
- Score: 0.4893345190925178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While neural network binary classifiers are often evaluated on metrics such
as Accuracy and $F_1$-Score, they are commonly trained with a cross-entropy
objective. How can this training-evaluation gap be addressed? While specific
techniques have been adopted to optimize certain confusion matrix based
metrics, it is challenging or impossible in some cases to generalize the
techniques to other metrics. Adversarial learning approaches have also been
proposed to optimize networks via confusion matrix based metrics, but they tend
to be much slower than common training methods. In this work, we propose a
unifying approach to training neural network binary classifiers that combines a
differentiable approximation of the Heaviside function with a probabilistic
view of the typical confusion matrix values using soft sets. Our theoretical
analysis shows the benefit of using our method to optimize for a given
evaluation metric, such as $F_1$-Score, with soft sets, and our extensive
experiments show the effectiveness of our approach in several domains.
Related papers
- Aligning Multiclass Neural Network Classifier Criterion with Task Performance via $F_β$-Score [2.8583357090792703]
Multiclass neural network classifiers are typically trained using cross-entropy loss.
It is questionable whether the use of cross-entropy will yield a classifier that aligns with the intended application-specific performance criteria.
We present a theoretical analysis that shows that our method can be used to optimize for a soft-set based approximation of Macro-$F_beta$.
arXiv Detail & Related papers (2024-05-31T15:54:01Z) - AnyLoss: Transforming Classification Metrics into Loss Functions [21.34290540936501]
evaluation metrics can be used to assess the performance of models in binary classification tasks.
Most metrics are derived from a confusion matrix in a non-differentiable form, making it difficult to generate a differentiable loss function that could directly optimize them.
We propose a general-purpose approach that transforms any confusion matrix-based metric into a loss function, textitAnyLoss, that is available in optimization processes.
arXiv Detail & Related papers (2024-05-23T16:14:16Z) - What to Do When Your Discrete Optimization Is the Size of a Neural
Network? [24.546550334179486]
Machine learning applications using neural networks involve solving discrete optimization problems.
classical approaches used in discrete settings do not scale well to large neural networks.
We take continuation path (CP) methods to represent using purely the former and Monte Carlo (MC) methods to represent the latter.
arXiv Detail & Related papers (2024-02-15T21:57:43Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Scalable computation of prediction intervals for neural networks via
matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure.
This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z) - Compare learning: bi-attention network for few-shot learning [6.559037166322981]
One of the Few-shot learning methods called metric learning addresses this challenge by first learning a deep distance metric to determine whether a pair of images belong to the same category.
In this paper, we propose a novel approach named Bi-attention network to compare the instances, which can measure the similarity between embeddings of instances precisely, globally and efficiently.
arXiv Detail & Related papers (2022-03-25T07:39:10Z) - Meta-learning representations for clustering with infinite Gaussian
mixture models [39.56814839510978]
We propose a meta-learning method that train neural networks for obtaining representations such that clustering performance improves.
The proposed method can cluster unseen unlabeled data using knowledge meta-learned with labeled data that are different from the unlabeled data.
arXiv Detail & Related papers (2021-03-01T02:05:31Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification.
Our strategy enables important aspects of the base learner objective to be learned during meta-training.
We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z) - Learning with Differentiable Perturbed Optimizers [54.351317101356614]
We propose a systematic method to transform operations into operations that are differentiable and never locally constant.
Our approach relies on perturbeds, and can be used readily together with existing solvers.
We show how this framework can be connected to a family of losses developed in structured prediction, and give theoretical guarantees for their use in learning tasks.
arXiv Detail & Related papers (2020-02-20T11:11:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.