Related papers: Aligning Multiclass Neural Network Classifier Criterion with Task Performance via $F

Aligning Multiclass Neural Network Classifier Criterion with Task Performance via $F_β$-Score

URL: http://arxiv.org/abs/2405.20954v1
Date: Fri, 31 May 2024 15:54:01 GMT
Title: Aligning Multiclass Neural Network Classifier Criterion with Task Performance via $F_β$-Score
Authors: Nathan Tsoi, Deyuan Li, Taesoo Daniel Lee, Marynel Vázquez,
Abstract summary: Multiclass neural network classifiers are typically trained using cross-entropy loss. It is questionable whether the use of cross-entropy will yield a classifier that aligns with the intended application-specific performance criteria. We present a theoretical analysis that shows that our method can be used to optimize for a soft-set based approximation of Macro-$F_beta$.
Score: 2.8583357090792703
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Multiclass neural network classifiers are typically trained using cross-entropy loss. Following training, the performance of this same neural network is evaluated using an application-specific metric based on the multiclass confusion matrix, such as the Macro $F_\beta$-Score. It is questionable whether the use of cross-entropy will yield a classifier that aligns with the intended application-specific performance criteria, particularly in scenarios where there is a need to emphasize one aspect of classifier performance. For example, if greater precision is preferred over recall, the $\beta$ value in the $F_\beta$ evaluation metric can be adjusted accordingly, but the cross-entropy objective remains unaware of this preference during training. We propose a method that addresses this training-evaluation gap for multiclass neural network classifiers such that users can train these models informed by the desired final $F_\beta$-Score. Following prior work in binary classification, we utilize the concepts of the soft-set confusion matrices and a piecewise-linear approximation of the Heaviside step function. Our method extends the $2 \times 2$ binary soft-set confusion matrix to a multiclass $d \times d$ confusion matrix and proposes dynamic adaptation of the threshold value $\tau$, which parameterizes the piecewise-linear Heaviside approximation during run-time. We present a theoretical analysis that shows that our method can be used to optimize for a soft-set based approximation of Macro-$F_\beta$ that is a consistent estimator of Macro-$F_\beta$, and our extensive experiments show the practical effectiveness of our approach.

Related papers

Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL) We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$. The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z)
Hyperspherical Classification with Dynamic Label-to-Prototype Assignment [5.978350039412277]
We present a simple yet effective method to optimize the category assigned to each prototype during the training. We solve this optimization using a sequential combination of gradient descent and Bipartide matching. Our method outperforms its competitors by 1.22% accuracy on CIFAR-100, and 2.15% on ImageNet-200 using a metric space dimension half of the size of its competitors.
arXiv Detail & Related papers (2024-03-25T17:01:34Z)
Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures [85.76673783330334]
Two different settings of linear weight-sharing layers motivate two flavours of Kronecker-Factored Approximate Curvature (K-FAC) We show they are exact for deep linear networks with weight-sharing in their respective setting. We observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer.
arXiv Detail & Related papers (2023-11-01T16:37:00Z)
Efficient Maximal Coding Rate Reduction by Variational Forms [25.1177997721719]
We reformulate the principle of Maximal Coding Rate Reduction to a form that can scale significantly without compromising training accuracy. Experiments in image classification demonstrate that our proposed formulation results in a significant speed up over optimizing the original MCR$2$ objective.
arXiv Detail & Related papers (2022-03-31T20:39:53Z)
Linear Speedup in Personalized Collaborative Learning [69.45124829480106]
Personalization in federated learning can improve the accuracy of a model for a user by trading off the model's bias. We formalize the personalized collaborative learning problem as optimization of a user's objective. We explore conditions under which we can optimally trade-off their bias for a reduction in variance.
arXiv Detail & Related papers (2021-11-10T22:12:52Z)
A Minimax Probability Machine for Non-Decomposable Performance Measures [15.288802707471792]
Imbalanced classification tasks are widespread in many real-world applications. The minimax probability machine is a popular method for binary classification problems. This paper develops a new minimax probability machine for the $F_beta$ measure, called MPMF, which can be used to deal with imbalanced classification tasks.
arXiv Detail & Related papers (2021-02-28T04:58:46Z)
Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification. Our strategy enables important aspects of the base learner objective to be learned during meta-training. We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z)
Bridging the Gap: Unifying the Training and Evaluation of Neural Network Binary Classifiers [0.4893345190925178]
We propose a unifying approach to training neural network binary classifiers that combines a differentiable approximation of the Heaviside function with a probabilistic view of the typical confusion matrix values using soft sets. Our theoretical analysis shows the benefit of using our method to optimize for a given evaluation metric, such as $F_$-Score, with soft sets.
arXiv Detail & Related papers (2020-09-02T22:13:26Z)
Structure Learning in Inverse Ising Problems Using $\ell_2$-Regularized Linear Estimator [8.89493507314525]
We show that despite the model mismatch, one can perfectly identify the network structure using naive linear regression without regularization. We propose a two-stage estimator: In the first stage, the ridge regression is used and the estimates are pruned by a relatively small threshold. This estimator with the appropriate regularization coefficient and thresholds is shown to achieve the perfect identification of the network structure even in $0M/N1$.
arXiv Detail & Related papers (2020-08-19T09:11:33Z)
Stochastic Flows and Geometric Optimization on the Orthogonal Group [52.50121190744979]
We present a new class of geometrically-driven optimization algorithms on the orthogonal group $O(d)$. We show that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, flows and metric learning.
arXiv Detail & Related papers (2020-03-30T15:37:50Z)
Supervised Quantile Normalization for Low-rank Matrix Approximation [50.445371939523305]
We learn the parameters of quantile normalization operators that can operate row-wise on the values of $X$ and/or of its factorization $UV$ to improve the quality of the low-rank representation of $X$ itself. We demonstrate the applicability of these techniques on synthetic and genomics datasets.
arXiv Detail & Related papers (2020-02-08T21:06:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.