Label-Wise Document Pre-Training for Multi-Label Text Classification
- URL: http://arxiv.org/abs/2008.06695v1
- Date: Sat, 15 Aug 2020 10:34:27 GMT
- Title: Label-Wise Document Pre-Training for Multi-Label Text Classification
- Authors: Han Liu, Caixia Yuan, and Xiaojie Wang
- Abstract summary: This paper develops Label-Wise Pre-Training (LW-PT) method to get a document representation with label-aware information.
The basic idea is that, a multi-label document can be represented as a combination of multiple label-wise representations, and that, correlated labels always cooccur in the same or similar documents.
- Score: 14.439051753832032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A major challenge of multi-label text classification (MLTC) is to
stimulatingly exploit possible label differences and label correlations. In
this paper, we tackle this challenge by developing Label-Wise Pre-Training
(LW-PT) method to get a document representation with label-aware information.
The basic idea is that, a multi-label document can be represented as a
combination of multiple label-wise representations, and that, correlated labels
always cooccur in the same or similar documents. LW-PT implements this idea by
constructing label-wise document classification tasks and trains label-wise
document encoders. Finally, the pre-trained label-wise encoder is fine-tuned
with the downstream MLTC task. Extensive experimental results validate that the
proposed method has significant advantages over the previous state-of-the-art
models and is able to discover reasonable label relationship. The code is
released to facilitate other researchers.
Related papers
- Weakly-Supervised Scientific Document Classification via
Retrieval-Augmented Multi-Stage Training [24.2734548438594]
We propose a weakly-supervised approach for scientific document classification using label names only.
In scientific domains, label names often include domain-specific concepts that may not appear in the document corpus.
We show that WANDER outperforms the best baseline by 11.9% on average.
arXiv Detail & Related papers (2023-06-12T15:50:13Z) - Imprecise Label Learning: A Unified Framework for Learning with Various Imprecise Label Configurations [91.67511167969934]
imprecise label learning (ILL) is a framework for the unification of learning with various imprecise label configurations.
We demonstrate that ILL can seamlessly adapt to partial label learning, semi-supervised learning, noisy label learning, and, more importantly, a mixture of these settings.
arXiv Detail & Related papers (2023-05-22T04:50:28Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - Adopting the Multi-answer Questioning Task with an Auxiliary Metric for
Extreme Multi-label Text Classification Utilizing the Label Hierarchy [10.87653109398961]
This paper adopts the multi-answer questioning task for extreme multi-label classification.
This study adopts the proposed method and the evaluation metric to the legal domain.
arXiv Detail & Related papers (2023-03-02T08:40:31Z) - Large Loss Matters in Weakly Supervised Multi-Label Classification [50.262533546999045]
We first regard unobserved labels as negative labels, casting the W task into noisy multi-label classification.
We propose novel methods for W which reject or correct the large loss samples to prevent model from memorizing the noisy label.
Our methodology actually works well, validating that treating large loss properly matters in a weakly supervised multi-label classification.
arXiv Detail & Related papers (2022-06-08T08:30:24Z) - Semantic-Aware Representation Blending for Multi-Label Image Recognition
with Partial Labels [86.17081952197788]
We propose to blend category-specific representation across different images to transfer information of known labels to complement unknown labels.
Experiments on the MS-COCO, Visual Genome, Pascal VOC 2007 datasets show that the proposed SARB framework obtains superior performance over current leading competitors.
arXiv Detail & Related papers (2022-03-04T07:56:16Z) - Structured Semantic Transfer for Multi-Label Recognition with Partial
Labels [85.6967666661044]
We propose a structured semantic transfer (SST) framework that enables training multi-label recognition models with partial labels.
The framework consists of two complementary transfer modules that explore within-image and cross-image semantic correlations.
Experiments on the Microsoft COCO, Visual Genome and Pascal VOC datasets show that the proposed SST framework obtains superior performance over current state-of-the-art algorithms.
arXiv Detail & Related papers (2021-12-21T02:15:01Z) - Enhancing Label Correlation Feedback in Multi-Label Text Classification
via Multi-Task Learning [6.1538971100140145]
We introduce a novel approach with multi-task learning to enhance label correlation feedback.
We propose two auxiliary label co-occurrence prediction tasks to enhance label correlation learning.
arXiv Detail & Related papers (2021-06-06T12:26:14Z) - A Study on the Autoregressive and non-Autoregressive Multi-label
Learning [77.11075863067131]
We propose a self-attention based variational encoder-model to extract the label-label and label-feature dependencies jointly.
Our model can therefore be used to predict all labels in parallel while still including both label-label and label-feature dependencies.
arXiv Detail & Related papers (2020-12-03T05:41:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.