Differentiable Top-k Classification Learning
- URL: http://arxiv.org/abs/2206.07290v1
- Date: Wed, 15 Jun 2022 04:13:59 GMT
- Title: Differentiable Top-k Classification Learning
- Authors: Felix Petersen, Hilde Kuehne, Christian Borgelt, Oliver Deussen
- Abstract summary: We optimize the model for multiple k simultaneously instead of using a single k.
We find that relaxing k does not only produce better top-5 accuracies, but also leads to top-1 accuracy improvements.
- Score: 29.75063301688965
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The top-k classification accuracy is one of the core metrics in machine
learning. Here, k is conventionally a positive integer, such as 1 or 5, leading
to top-1 or top-5 training objectives. In this work, we relax this assumption
and optimize the model for multiple k simultaneously instead of using a single
k. Leveraging recent advances in differentiable sorting and ranking, we propose
a differentiable top-k cross-entropy classification loss. This allows training
the network while not only considering the top-1 prediction, but also, e.g.,
the top-2 and top-5 predictions. We evaluate the proposed loss function for
fine-tuning on state-of-the-art architectures, as well as for training from
scratch. We find that relaxing k does not only produce better top-5 accuracies,
but also leads to top-1 accuracy improvements. When fine-tuning publicly
available ImageNet models, we achieve a new state-of-the-art for these models.
Related papers
- Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models [75.9543301303586]
Foundation models like CLIP allow zero-shot transfer on various tasks without additional training data.
Fine-tuning and ensembling are also commonly adopted to better fit the downstream tasks.
However, we argue that prior work has overlooked the inherent biases in foundation models.
arXiv Detail & Related papers (2023-10-12T08:01:11Z) - Improving Zero-shot Generalization and Robustness of Multi-modal Models [70.14692320804178]
Multi-modal image-text models such as CLIP and LiT have demonstrated impressive performance on image classification benchmarks.
We investigate the reasons for this performance gap and find that many of the failure cases are caused by ambiguity in the text prompts.
We propose a simple and efficient way to improve accuracy on such uncertain images by making use of the WordNet hierarchy.
arXiv Detail & Related papers (2022-12-04T07:26:24Z) - Are Deep Sequence Classifiers Good at Non-Trivial Generalization? [4.941630596191806]
We study binary sequence classification problems and we look at model calibration from a different perspective.
We focus on sparse sequence classification, that is problems in which the target class is rare and compare three deep learning sequence classification models.
Our results suggest that in this binary setting the deep-learning models are indeed able to learn the underlying class distribution in a non-trivial manner.
arXiv Detail & Related papers (2022-10-24T10:01:06Z) - RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses [39.67403439576671]
We propose two T5-based ranking model structures, an encoder-decoder and an encoder-only one, so that they can directly output ranking scores for each query-document pair.
Our experiments show that the proposed models with ranking losses can achieve substantial ranking performance gains on different public text ranking data sets.
arXiv Detail & Related papers (2022-10-12T20:51:49Z) - A Case Study on the Classification of Lost Circulation Events During
Drilling using Machine Learning Techniques on an Imbalanced Large Dataset [0.0]
We utilize a 65,000+ records data with class imbalance problem from Azadegan oilfield formations in Iran.
Eleven of the dataset's seventeen parameters are chosen to be used in the classification of five lost circulation events.
To generate classification models, we used six basic machine learning algorithms and four ensemble learning methods.
arXiv Detail & Related papers (2022-09-04T12:28:40Z) - Optimizing Partial Area Under the Top-k Curve: Theory and Practice [151.5072746015253]
We develop a novel metric named partial Area Under the top-k Curve (AUTKC)
AUTKC has a better discrimination ability, and its Bayes optimal score function could give a correct top-K ranking with respect to the conditional probability.
We present an empirical surrogate risk minimization framework to optimize the proposed metric.
arXiv Detail & Related papers (2022-09-03T11:09:13Z) - KNN-BERT: Fine-Tuning Pre-Trained Models with KNN Classifier [61.063988689601416]
Pre-trained models are widely used in fine-tuning downstream tasks with linear classifiers optimized by the cross-entropy loss.
These problems can be improved by learning representations that focus on similarities in the same class and contradictions when making predictions.
We introduce the KNearest Neighbors in pre-trained model fine-tuning tasks in this paper.
arXiv Detail & Related papers (2021-10-06T06:17:05Z) - Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios.
For dataset bias due to different samplers, we propose shifted batch normalization.
Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z) - Trade-offs in Top-k Classification Accuracies on Losses for Deep
Learning [0.0]
Cross entropy (CE) is not guaranteed to optimize top-k prediction without infinite training data and model complexities.
Our novel loss is basically CE modified by grouping temporal top-k classes as a single class.
Our loss has been found to provide better top-k accuracies compared to CE at k larger than 10.
arXiv Detail & Related papers (2020-07-30T10:18:57Z) - Don't Wait, Just Weight: Improving Unsupervised Representations by
Learning Goal-Driven Instance Weights [92.16372657233394]
Self-supervised learning techniques can boost performance by learning useful representations from unlabelled data.
We show that by learning Bayesian instance weights for the unlabelled data, we can improve the downstream classification accuracy.
Our method, BetaDataWeighter is evaluated using the popular self-supervised rotation prediction task on STL-10 and Visual Decathlon.
arXiv Detail & Related papers (2020-06-22T15:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.