Related papers: Multi-Head Encoding for Extreme Label Classification

Multi-Head Encoding for Extreme Label Classification

URL: http://arxiv.org/abs/2412.10182v1
Date: Fri, 13 Dec 2024 14:53:47 GMT
Title: Multi-Head Encoding for Extreme Label Classification
Authors: Daojun Liang, Haixia Zhang, Dongfeng Yuan, Minggao Zhang,
Abstract summary: eXtreme Classification Label (XLC) has been established to distinguish massive labels.<n>As the number of categories increases, the number of parameters and nonlinear operations in the classifier also rises.<n>This results in a Computational Overload Problem (CCOP)
Score: 15.815842882043734
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The number of categories of instances in the real world is normally huge, and each instance may contain multiple labels. To distinguish these massive labels utilizing machine learning, eXtreme Label Classification (XLC) has been established. However, as the number of categories increases, the number of parameters and nonlinear operations in the classifier also rises. This results in a Classifier Computational Overload Problem (CCOP). To address this, we propose a Multi-Head Encoding (MHE) mechanism, which replaces the vanilla classifier with a multi-head classifier. During the training process, MHE decomposes extreme labels into the product of multiple short local labels, with each head trained on these local labels. During testing, the predicted labels can be directly calculated from the local predictions of each head. This reduces the computational load geometrically. Then, according to the characteristics of different XLC tasks, e.g., single-label, multi-label, and model pretraining tasks, three MHE-based implementations, i.e., Multi-Head Product, Multi-Head Cascade, and Multi-Head Sampling, are proposed to more effectively cope with CCOP. Moreover, we theoretically demonstrate that MHE can achieve performance approximately equivalent to that of the vanilla classifier by generalizing the low-rank approximation problem from Frobenius-norm to Cross-Entropy. Experimental results show that the proposed methods achieve state-of-the-art performance while significantly streamlining the training and inference processes of XLC tasks. The source code has been made public at https://github.com/Anoise/MHE.

Related papers

Label Cluster Chains for Multi-Label Classification [2.072831155509228]
Multi-label classification is a type of supervised machine learning that can simultaneously assign multiple labels to an instance. We propose a method to chain disjoint correlated label clusters obtained by applying a partition method in the label space. Our proposal shows that learning and chaining disjoint correlated label clusters can better explore and learn label correlations.
arXiv Detail & Related papers (2024-11-01T11:16:37Z)
LC-Protonets: Multi-Label Few-Shot Learning for World Music Audio Tagging [65.72891334156706]
We introduce Label-Combination Prototypical Networks (LC-Protonets) to address the problem of multi-label few-shot classification. LC-Protonets generate one prototype per label combination, derived from the power set of labels present in the limited training items. Our method is applied to automatic audio tagging across diverse music datasets, covering various cultures and including both modern and traditional music.
arXiv Detail & Related papers (2024-09-17T15:13:07Z)
UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification [42.36546066941635]
Extreme Multi-label Classification (XMC) involves predicting a subset of relevant labels from an extremely large label space. This work proposes UniDEC, a novel end-to-end trainable framework which trains the dual encoder and classifier in together.
arXiv Detail & Related papers (2024-05-04T17:27:51Z)
Taming the Sigmoid Bottleneck: Provably Argmaxable Sparse Multi-Label Classification [13.845115961850434]
Sigmoid output layers are widely used in multi-label classification (MLC) tasks. In many practical MLC tasks, the number of possible labels is in the thousands, exceeding the number of input features. We show that such a low-rank output layer is a bottleneck that can result in unargmaxable classes.
arXiv Detail & Related papers (2023-10-16T14:25:50Z)
Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data. This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z)
Complementary to Multiple Labels: A Correlation-Aware Correction Approach [65.59584909436259]
We show theoretically how the estimated transition matrix in multi-class CLL could be distorted in multi-labeled cases. We propose a two-step method to estimate the transition matrix from candidate labels.
arXiv Detail & Related papers (2023-02-25T04:48:48Z)
Combining Metric Learning and Attention Heads For Accurate and Efficient Multilabel Image Classification [0.0]
We revisit two popular approaches to multilabel classification: transformer-based heads and labels relations information graph processing branches. Although transformer-based heads are considered to achieve better results than graph-based branches, we argue that with the proper training strategy graph-based methods can demonstrate just a small accuracy drop.
arXiv Detail & Related papers (2022-09-14T12:06:47Z)
Trustable Co-label Learning from Multiple Noisy Annotators [68.59187658490804]
Supervised deep learning depends on massive accurately annotated examples. A typical alternative is learning from multiple noisy annotators. This paper proposes a data-efficient approach, called emphTrustable Co-label Learning (TCL)
arXiv Detail & Related papers (2022-03-08T16:57:00Z)
Gated recurrent units and temporal convolutional network for multilabel classification [122.84638446560663]
This work proposes a new ensemble method for managing multilabel classification. The core of the proposed approach combines a set of gated recurrent units and temporal convolutional neural networks trained with variants of the Adam gradients optimization approach.
arXiv Detail & Related papers (2021-10-09T00:00:16Z)
Label Disentanglement in Partition-based Extreme Multilabel Classification [111.25321342479491]
We show that the label assignment problem in partition-based XMC can be formulated as an optimization problem. We show that our method can successfully disentangle multi-modal labels, leading to state-of-the-art (SOTA) results on four XMC benchmarks.
arXiv Detail & Related papers (2021-06-24T03:24:18Z)
Probabilistic Label Trees for Extreme Multi-label Classification [8.347190888362194]
Problems of extreme multi-label classification (XMLC) are efficiently handled by organizing labels as a tree. PLTs can be treated as a generalization of hierarchical softmax for multi-label problems. We introduce the model and discuss training and inference procedures and their computational costs. We prove a specific equivalence between the fully online algorithm and an algorithm with a tree structure given in advance.
arXiv Detail & Related papers (2020-09-23T15:30:00Z)
GPU-based Self-Organizing Maps for Post-Labeled Few-Shot Unsupervised Learning [2.922007656878633]
Few-shot classification is a challenge in machine learning where the goal is to train a classifier using a very limited number of labeled examples. We consider the problem of post-labeled few-shot unsupervised learning, a classification task where representations are learned in an unsupervised fashion, to be later labeled using very few annotated examples.
arXiv Detail & Related papers (2020-09-04T13:22:28Z)
Unsupervised Person Re-identification via Multi-label Classification [55.65870468861157]
This paper formulates unsupervised person ReID as a multi-label classification task to progressively seek true labels. Our method starts by assigning each person image with a single-class label, then evolves to multi-label classification by leveraging the updated ReID model for label prediction. To boost the ReID model training efficiency in multi-label classification, we propose the memory-based multi-label classification loss (MMCL)
arXiv Detail & Related papers (2020-04-20T12:13:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.