Related papers: Differentiable Top-k Classification Learning

Differentiable Top-k Classification Learning

URL: http://arxiv.org/abs/2206.07290v1
Date: Wed, 15 Jun 2022 04:13:59 GMT
Title: Differentiable Top-k Classification Learning
Authors: Felix Petersen, Hilde Kuehne, Christian Borgelt, Oliver Deussen
Abstract summary: We optimize the model for multiple k simultaneously instead of using a single k. We find that relaxing k does not only produce better top-5 accuracies, but also leads to top-1 accuracy improvements.
Score: 29.75063301688965
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The top-k classification accuracy is one of the core metrics in machine learning. Here, k is conventionally a positive integer, such as 1 or 5, leading to top-1 or top-5 training objectives. In this work, we relax this assumption and optimize the model for multiple k simultaneously instead of using a single k. Leveraging recent advances in differentiable sorting and ranking, we propose a differentiable top-k cross-entropy classification loss. This allows training the network while not only considering the top-1 prediction, but also, e.g., the top-2 and top-5 predictions. We evaluate the proposed loss function for fine-tuning on state-of-the-art architectures, as well as for training from scratch. We find that relaxing k does not only produce better top-5 accuracies, but also leads to top-1 accuracy improvements. When fine-tuning publicly available ImageNet models, we achieve a new state-of-the-art for these models.

Related papers

Rethinking Early Stopping: Refine, Then Calibrate [49.966899634962374]
We show that calibration error and refinement error are not minimized simultaneously during training. We introduce a new metric for early stopping and hyper parameter tuning that makes it possible to minimize refinement error during training. Our method integrates seamlessly with any architecture and consistently improves performance across diverse classification tasks.
arXiv Detail & Related papers (2025-01-31T15:03:54Z)
Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models [75.9543301303586]
Foundation models like CLIP allow zero-shot transfer on various tasks without additional training data. Fine-tuning and ensembling are also commonly adopted to better fit the downstream tasks. However, we argue that prior work has overlooked the inherent biases in foundation models.
arXiv Detail & Related papers (2023-10-12T08:01:11Z)
Improving Zero-shot Generalization and Robustness of Multi-modal Models [70.14692320804178]
Multi-modal image-text models such as CLIP and LiT have demonstrated impressive performance on image classification benchmarks. We investigate the reasons for this performance gap and find that many of the failure cases are caused by ambiguity in the text prompts. We propose a simple and efficient way to improve accuracy on such uncertain images by making use of the WordNet hierarchy.
arXiv Detail & Related papers (2022-12-04T07:26:24Z)
Are Deep Sequence Classifiers Good at Non-Trivial Generalization? [4.941630596191806]
We study binary sequence classification problems and we look at model calibration from a different perspective. We focus on sparse sequence classification, that is problems in which the target class is rare and compare three deep learning sequence classification models. Our results suggest that in this binary setting the deep-learning models are indeed able to learn the underlying class distribution in a non-trivial manner.
arXiv Detail & Related papers (2022-10-24T10:01:06Z)
RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses [39.67403439576671]
We propose two T5-based ranking model structures, an encoder-decoder and an encoder-only one, so that they can directly output ranking scores for each query-document pair. Our experiments show that the proposed models with ranking losses can achieve substantial ranking performance gains on different public text ranking data sets.
arXiv Detail & Related papers (2022-10-12T20:51:49Z)
A Case Study on the Classification of Lost Circulation Events During Drilling using Machine Learning Techniques on an Imbalanced Large Dataset [0.0]
We utilize a 65,000+ records data with class imbalance problem from Azadegan oilfield formations in Iran. Eleven of the dataset's seventeen parameters are chosen to be used in the classification of five lost circulation events. To generate classification models, we used six basic machine learning algorithms and four ensemble learning methods.
arXiv Detail & Related papers (2022-09-04T12:28:40Z)
Optimizing Partial Area Under the Top-k Curve: Theory and Practice [151.5072746015253]
We develop a novel metric named partial Area Under the top-k Curve (AUTKC) AUTKC has a better discrimination ability, and its Bayes optimal score function could give a correct top-K ranking with respect to the conditional probability. We present an empirical surrogate risk minimization framework to optimize the proposed metric.
arXiv Detail & Related papers (2022-09-03T11:09:13Z)
KNN-BERT: Fine-Tuning Pre-Trained Models with KNN Classifier [61.063988689601416]
Pre-trained models are widely used in fine-tuning downstream tasks with linear classifiers optimized by the cross-entropy loss. These problems can be improved by learning representations that focus on similarities in the same class and contradictions when making predictions. We introduce the KNearest Neighbors in pre-trained model fine-tuning tasks in this paper.
arXiv Detail & Related papers (2021-10-06T06:17:05Z)
Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios. For dataset bias due to different samplers, we propose shifted batch normalization. Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z)
Trade-offs in Top-k Classification Accuracies on Losses for Deep Learning [0.0]
Cross entropy (CE) is not guaranteed to optimize top-k prediction without infinite training data and model complexities. Our novel loss is basically CE modified by grouping temporal top-k classes as a single class. Our loss has been found to provide better top-k accuracies compared to CE at k larger than 10.
arXiv Detail & Related papers (2020-07-30T10:18:57Z)
Don't Wait, Just Weight: Improving Unsupervised Representations by Learning Goal-Driven Instance Weights [92.16372657233394]
Self-supervised learning techniques can boost performance by learning useful representations from unlabelled data. We show that by learning Bayesian instance weights for the unlabelled data, we can improve the downstream classification accuracy. Our method, BetaDataWeighter is evaluated using the popular self-supervised rotation prediction task on STL-10 and Visual Decathlon.
arXiv Detail & Related papers (2020-06-22T15:59:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.