Related papers: Taming the Sigmoid Bottleneck: Provably Argmaxable Sparse Multi-Label Classification

Taming the Sigmoid Bottleneck: Provably Argmaxable Sparse Multi-Label Classification

URL: http://arxiv.org/abs/2310.10443v2
Date: Mon, 29 Jan 2024 17:14:01 GMT
Title: Taming the Sigmoid Bottleneck: Provably Argmaxable Sparse Multi-Label Classification
Authors: Andreas Grivas and Antonio Vergari and Adam Lopez
Abstract summary: Sigmoid output layers are widely used in multi-label classification (MLC) tasks. In many practical MLC tasks, the number of possible labels is in the thousands, exceeding the number of input features. We show that such a low-rank output layer is a bottleneck that can result in unargmaxable classes.
Score: 13.845115961850434
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sigmoid output layers are widely used in multi-label classification (MLC) tasks, in which multiple labels can be assigned to any input. In many practical MLC tasks, the number of possible labels is in the thousands, often exceeding the number of input features and resulting in a low-rank output layer. In multi-class classification, it is known that such a low-rank output layer is a bottleneck that can result in unargmaxable classes: classes which cannot be predicted for any input. In this paper, we show that for MLC tasks, the analogous sigmoid bottleneck results in exponentially many unargmaxable label combinations. We explain how to detect these unargmaxable outputs and demonstrate their presence in three widely used MLC datasets. We then show that they can be prevented in practice by introducing a Discrete Fourier Transform (DFT) output layer, which guarantees that all sparse label combinations with up to $k$ active labels are argmaxable. Our DFT layer trains faster and is more parameter efficient, matching the F1@k score of a sigmoid layer while using up to 50% fewer trainable parameters. Our code is publicly available at https://github.com/andreasgrv/sigmoid-bottleneck.

Related papers

Multi-Head Encoding for Extreme Label Classification [15.815842882043734]
eXtreme Classification Label (XLC) has been established to distinguish massive labels. As the number of categories increases, the number of parameters and nonlinear operations in the classifier also rises. This results in a Computational Overload Problem (CCOP)
arXiv Detail & Related papers (2024-12-13T14:53:47Z)
UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification [42.36546066941635]
Extreme Multi-label Classification (XMC) involves predicting a subset of relevant labels from an extremely large label space. This work proposes UniDEC, a novel end-to-end trainable framework which trains the dual encoder and classifier in together.
arXiv Detail & Related papers (2024-05-04T17:27:51Z)
Adopting the Multi-answer Questioning Task with an Auxiliary Metric for Extreme Multi-label Text Classification Utilizing the Label Hierarchy [10.87653109398961]
This paper adopts the multi-answer questioning task for extreme multi-label classification. This study adopts the proposed method and the evaluation metric to the legal domain.
arXiv Detail & Related papers (2023-03-02T08:40:31Z)
Complementary to Multiple Labels: A Correlation-Aware Correction Approach [65.59584909436259]
We show theoretically how the estimated transition matrix in multi-class CLL could be distorted in multi-labeled cases. We propose a two-step method to estimate the transition matrix from candidate labels.
arXiv Detail & Related papers (2023-02-25T04:48:48Z)
Multi-Instance Partial-Label Learning: Towards Exploiting Dual Inexact Supervision [53.530957567507365]
In some real-world tasks, each training sample is associated with a candidate label set that contains one ground-truth label and some false positive labels. In this paper, we formalize such problems as multi-instance partial-label learning (MIPL) Existing multi-instance learning algorithms and partial-label learning algorithms are suboptimal for solving MIPL problems.
arXiv Detail & Related papers (2022-12-18T03:28:51Z)
MultiGuard: Provably Robust Multi-label Classification against Adversarial Examples [67.0982378001551]
MultiGuard is the first provably robust defense against adversarial examples to multi-label classification. Our major theoretical contribution is that we show a certain number of ground truth labels of an input are provably in the set of labels predicted by our MultiGuard.
arXiv Detail & Related papers (2022-10-03T17:50:57Z)
Combining Metric Learning and Attention Heads For Accurate and Efficient Multilabel Image Classification [0.0]
We revisit two popular approaches to multilabel classification: transformer-based heads and labels relations information graph processing branches. Although transformer-based heads are considered to achieve better results than graph-based branches, we argue that with the proper training strategy graph-based methods can demonstrate just a small accuracy drop.
arXiv Detail & Related papers (2022-09-14T12:06:47Z)
Large Loss Matters in Weakly Supervised Multi-Label Classification [50.262533546999045]
We first regard unobserved labels as negative labels, casting the W task into noisy multi-label classification. We propose novel methods for W which reject or correct the large loss samples to prevent model from memorizing the noisy label. Our methodology actually works well, validating that treating large loss properly matters in a weakly supervised multi-label classification.
arXiv Detail & Related papers (2022-06-08T08:30:24Z)
Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples. A typical alternative is learning from multiple noisy annotators. This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z)
Label Mask for Multi-Label Text Classification [6.742627397194543]
We propose a Label Mask multi-label text classification model (LM-MTC), which is inspired by the idea of cloze questions of language model. On the basis, we assign a different token to each potential label, and randomly mask the token with a certain probability to build a label based Masked Language Model (MLM)
arXiv Detail & Related papers (2021-06-18T11:54:33Z)
An Empirical Study on Large-Scale Multi-Label Text Classification Including Few and Zero-Shot Labels [49.036212158261215]
Large-scale Multi-label Text Classification (LMTC) has a wide range of Natural Language Processing (NLP) applications. Current state-of-the-art LMTC models employ Label-Wise Attention Networks (LWANs) We show that hierarchical methods based on Probabilistic Label Trees (PLTs) outperform LWANs. We propose a new state-of-the-art method which combines BERT with LWANs.
arXiv Detail & Related papers (2020-10-04T18:55:47Z)
Identifying noisy labels with a transductive semi-supervised leave-one-out filter [2.4366811507669124]
We introduce the LGC_LVOF, a leave-one-out filtering approach based on the Local and Global Consistency (LGC) algorithm. Our approach is best suited to datasets with a large amount of unlabeled data but not many labels.
arXiv Detail & Related papers (2020-09-24T16:50:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.