Related papers: Bridging the Gap: Unifying the Training and Evaluation of Neural Network Binary Classifiers

Bridging the Gap: Unifying the Training and Evaluation of Neural Network Binary Classifiers

URL: http://arxiv.org/abs/2009.01367v3
Date: Thu, 2 Jun 2022 00:22:00 GMT
Title: Bridging the Gap: Unifying the Training and Evaluation of Neural Network Binary Classifiers
Authors: Nathan Tsoi, Kate Candon, Deyuan Li, Yofti Milkessa, Marynel V\'azquez
Abstract summary: We propose a unifying approach to training neural network binary classifiers that combines a differentiable approximation of the Heaviside function with a probabilistic view of the typical confusion matrix values using soft sets. Our theoretical analysis shows the benefit of using our method to optimize for a given evaluation metric, such as $F_$-Score, with soft sets.
Score: 0.4893345190925178
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While neural network binary classifiers are often evaluated on metrics such as Accuracy and $F_1$-Score, they are commonly trained with a cross-entropy objective. How can this training-evaluation gap be addressed? While specific techniques have been adopted to optimize certain confusion matrix based metrics, it is challenging or impossible in some cases to generalize the techniques to other metrics. Adversarial learning approaches have also been proposed to optimize networks via confusion matrix based metrics, but they tend to be much slower than common training methods. In this work, we propose a unifying approach to training neural network binary classifiers that combines a differentiable approximation of the Heaviside function with a probabilistic view of the typical confusion matrix values using soft sets. Our theoretical analysis shows the benefit of using our method to optimize for a given evaluation metric, such as $F_1$-Score, with soft sets, and our extensive experiments show the effectiveness of our approach in several domains.

Related papers

Aligning Multiclass Neural Network Classifier Criterion with Task Performance via $F_β$-Score [2.8583357090792703]
Multiclass neural network classifiers are typically trained using cross-entropy loss. It is questionable whether the use of cross-entropy will yield a classifier that aligns with the intended application-specific performance criteria. We present a theoretical analysis that shows that our method can be used to optimize for a soft-set based approximation of Macro-$F_beta$.
arXiv Detail & Related papers (2024-05-31T15:54:01Z)
AnyLoss: Transforming Classification Metrics into Loss Functions [21.34290540936501]
evaluation metrics can be used to assess the performance of models in binary classification tasks. Most metrics are derived from a confusion matrix in a non-differentiable form, making it difficult to generate a differentiable loss function that could directly optimize them. We propose a general-purpose approach that transforms any confusion matrix-based metric into a loss function, textitAnyLoss, that is available in optimization processes.
arXiv Detail & Related papers (2024-05-23T16:14:16Z)
Stability and Generalization in Free Adversarial Training [9.831489366502302]
We analyze the interconnections between generalization and optimization in adversarial training using the algorithmic stability framework. We compare the generalization gap of neural networks trained using the vanilla adversarial training method and the free adversarial training method. Our empirical findings suggest that the free adversarial training method could lead to a smaller generalization gap over a similar number of training iterations.
arXiv Detail & Related papers (2024-04-13T12:07:20Z)
What to Do When Your Discrete Optimization Is the Size of a Neural Network? [24.546550334179486]
Machine learning applications using neural networks involve solving discrete optimization problems. classical approaches used in discrete settings do not scale well to large neural networks. We take continuation path (CP) methods to represent using purely the former and Monte Carlo (MC) methods to represent the latter.
arXiv Detail & Related papers (2024-02-15T21:57:43Z)
Distributed Adversarial Training to Robustify Deep Neural Networks at Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training. We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z)
Scalable computation of prediction intervals for neural networks via matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure. This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z)
Compare learning: bi-attention network for few-shot learning [6.559037166322981]
One of the Few-shot learning methods called metric learning addresses this challenge by first learning a deep distance metric to determine whether a pair of images belong to the same category. In this paper, we propose a novel approach named Bi-attention network to compare the instances, which can measure the similarity between embeddings of instances precisely, globally and efficiently.
arXiv Detail & Related papers (2022-03-25T07:39:10Z)
Meta-learning representations for clustering with infinite Gaussian mixture models [39.56814839510978]
We propose a meta-learning method that train neural networks for obtaining representations such that clustering performance improves. The proposed method can cluster unseen unlabeled data using knowledge meta-learned with labeled data that are different from the unlabeled data.
arXiv Detail & Related papers (2021-03-01T02:05:31Z)
Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z)
Local Critic Training for Model-Parallel Learning of Deep Neural Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training. We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z)
Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification. Our strategy enables important aspects of the base learner objective to be learned during meta-training. We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z)
Learning with Differentiable Perturbed Optimizers [54.351317101356614]
We propose a systematic method to transform operations into operations that are differentiable and never locally constant. Our approach relies on perturbeds, and can be used readily together with existing solvers. We show how this framework can be connected to a family of losses developed in structured prediction, and give theoretical guarantees for their use in learning tasks.
arXiv Detail & Related papers (2020-02-20T11:11:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.