Extreme Zero-Shot Learning for Extreme Text Classification
- URL: http://arxiv.org/abs/2112.08652v1
- Date: Thu, 16 Dec 2021 06:06:42 GMT
- Title: Extreme Zero-Shot Learning for Extreme Text Classification
- Authors: Yuanhao Xiong, Wei-Cheng Chang, Cho-Jui Hsieh, Hsiang-Fu Yu, Inderjit
Dhillon
- Abstract summary: Extreme Zero-Shot XMC (EZ-XMC) and Few-Shot XMC (FS-XMC) are investigated.
We propose to pre-train Transformer-based encoders with self-supervised contrastive losses.
We develop a pre-training method MACLR, which thoroughly leverages the raw text with techniques including Multi-scale Adaptive Clustering, Label Regularization, and self-training with pseudo positive pairs.
- Score: 80.95271050744624
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The eXtreme Multi-label text Classification (XMC) problem concerns finding
most relevant labels for an input text instance from a large label set.
However, the XMC setup faces two challenges: (1) it is not generalizable to
predict unseen labels in dynamic environments, and (2) it requires a large
amount of supervised (instance, label) pairs, which can be difficult to obtain
for emerging domains. Recently, the generalized zero-shot XMC (GZ-XMC) setup
has been studied and ZestXML is proposed accordingly to handle the unseen
labels, which still requires a large number of annotated (instance, label)
pairs. In this paper, we consider a more practical scenario called Extreme
Zero-Shot XMC (EZ-XMC), in which no supervision is needed and merely raw text
of instances and labels are accessible. Few-Shot XMC (FS-XMC), an extension to
EZ-XMC with limited supervision is also investigated. To learn the semantic
embeddings of instances and labels with raw text, we propose to pre-train
Transformer-based encoders with self-supervised contrastive losses.
Specifically, we develop a pre-training method MACLR, which thoroughly
leverages the raw text with techniques including Multi-scale Adaptive
Clustering, Label Regularization, and self-training with pseudo positive pairs.
Experimental results on four public EZ-XMC datasets demonstrate that MACLR
achieves superior performance compared to all other leading baseline methods,
in particular with approximately 5-10% improvement in precision and recall on
average. Moreover, we also show that our pre-trained encoder can be further
improved on FS-XMC when there are a limited number of ground-truth positive
pairs in training. By fine-tuning the encoder on such a few-shot subset, MACLR
still outperforms other extreme classifiers significantly.
Related papers
- Prototypical Extreme Multi-label Classification with a Dynamic Margin Loss [6.244642999033755]
Extreme Multi-label Classification (XMC) methods predict relevant labels for a given query in an extremely large label space.
Recent works in XMC address this problem using deep encoders that project text descriptions to an embedding space suitable for recovering the closest labels.
We propose PRIME, a XMC method that employs a novel prototypical contrastive learning technique to reconcile efficiency and performance surpassing brute-force approaches.
arXiv Detail & Related papers (2024-10-27T10:24:23Z) - Zero-Shot Learning Over Large Output Spaces : Utilizing Indirect Knowledge Extraction from Large Language Models [3.908992369351976]
Extreme Zero-shot XMC (EZ-XMC) is a special setting of XMC wherein no supervision is provided.
Traditional state-of-the-art methods extract pseudo labels from the document title or segments.
We propose a framework to train a small bi-encoder model via the feedback from the large language model (LLM)
arXiv Detail & Related papers (2024-06-13T16:26:37Z) - Learning label-label correlations in Extreme Multi-label Classification via Label Features [44.00852282861121]
Extreme Multi-label Text Classification (XMC) involves learning a classifier that can assign an input with a subset of most relevant labels from millions of label choices.
Short-text XMC with label features has found numerous applications in areas such as query-to-ad-phrase matching in search ads, title-based product recommendation, prediction of related searches.
We propose Gandalf, a novel approach which makes use of a label co-occurrence graph to leverage label features as additional data points to supplement the training distribution.
arXiv Detail & Related papers (2024-05-03T21:18:43Z) - PINA: Leveraging Side Information in eXtreme Multi-label Classification
via Predicted Instance Neighborhood Aggregation [105.52660004082766]
The eXtreme Multi-label Classification(XMC) problem seeks to find relevant labels from an exceptionally large label space.
We propose Predicted Instance Neighborhood Aggregation (PINA), a data enhancement method for the general XMC problem.
Unlike most existing XMC frameworks that treat labels and input instances as featureless indicators and independent entries, PINA extracts information from the label metadata and the correlations among training instances.
arXiv Detail & Related papers (2023-05-21T05:00:40Z) - Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label
Text Classification [54.26205045417422]
Extreme multi-label text classification (XMC) seeks to find relevant labels from an extreme large label collection for a given text input.
transformer based XMC methods, such as X-Transformer and LightXML, have shown significant improvement over other XMC methods.
arXiv Detail & Related papers (2021-10-01T23:43:29Z) - Label Disentanglement in Partition-based Extreme Multilabel
Classification [111.25321342479491]
We show that the label assignment problem in partition-based XMC can be formulated as an optimization problem.
We show that our method can successfully disentangle multi-modal labels, leading to state-of-the-art (SOTA) results on four XMC benchmarks.
arXiv Detail & Related papers (2021-06-24T03:24:18Z) - An Empirical Study on Large-Scale Multi-Label Text Classification
Including Few and Zero-Shot Labels [49.036212158261215]
Large-scale Multi-label Text Classification (LMTC) has a wide range of Natural Language Processing (NLP) applications.
Current state-of-the-art LMTC models employ Label-Wise Attention Networks (LWANs)
We show that hierarchical methods based on Probabilistic Label Trees (PLTs) outperform LWANs.
We propose a new state-of-the-art method which combines BERT with LWANs.
arXiv Detail & Related papers (2020-10-04T18:55:47Z) - Extreme Multi-label Classification from Aggregated Labels [27.330826185375415]
Extreme multi-label classification (XMC) is the problem of finding the relevant labels for an input from a very large universe of possible labels.
We develop a new and scalable algorithm to impute individual-sample labels from the group labels.
This can be paired with any existing XMC method to solve the aggregated label problem.
arXiv Detail & Related papers (2020-04-01T02:13:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.