Weakly-supervised Text Classification Based on Keyword Graph
- URL: http://arxiv.org/abs/2110.02591v1
- Date: Wed, 6 Oct 2021 08:58:02 GMT
- Title: Weakly-supervised Text Classification Based on Keyword Graph
- Authors: Lu Zhang, Jiandong Ding, Yi Xu, Yingyao Liu and Shuigeng Zhou
- Abstract summary: We propose a novel framework called ClassKG to explore keyword-keyword correlation on keyword graph by GNN.
Our framework is an iterative process. In each iteration, we first construct a keyword graph, so the task of assigning pseudo labels is transformed to annotating keyword subgraphs.
With the pseudo labels generated by the subgraph annotator, we then train a text classifier to classify the unlabeled texts.
- Score: 30.57722085686241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly-supervised text classification has received much attention in recent
years for it can alleviate the heavy burden of annotating massive data. Among
them, keyword-driven methods are the mainstream where user-provided keywords
are exploited to generate pseudo-labels for unlabeled texts. However, existing
methods treat keywords independently, thus ignore the correlation among them,
which should be useful if properly exploited. In this paper, we propose a novel
framework called ClassKG to explore keyword-keyword correlation on keyword
graph by GNN. Our framework is an iterative process. In each iteration, we
first construct a keyword graph, so the task of assigning pseudo labels is
transformed to annotating keyword subgraphs. To improve the annotation quality,
we introduce a self-supervised task to pretrain a subgraph annotator, and then
finetune it. With the pseudo labels generated by the subgraph annotator, we
then train a text classifier to classify the unlabeled texts. Finally, we
re-extract keywords from the classified texts. Extensive experiments on both
long-text and short-text datasets show that our method substantially
outperforms the existing ones
Related papers
- Scribbles for All: Benchmarking Scribble Supervised Segmentation Across Datasets [51.74296438621836]
We introduce Scribbles for All, a label and training data generation algorithm for semantic segmentation trained on scribble labels.
The main limitation of scribbles as source for weak supervision is the lack of challenging datasets for scribble segmentation.
Scribbles for All provides scribble labels for several popular segmentation datasets and provides an algorithm to automatically generate scribble labels for any dataset with dense annotations.
arXiv Detail & Related papers (2024-08-22T15:29:08Z) - Copy Is All You Need [66.00852205068327]
We formulate text generation as progressively copying text segments from an existing text collection.
Our approach achieves better generation quality according to both automatic and human evaluations.
Our approach attains additional performance gains by simply scaling up to larger text collections.
arXiv Detail & Related papers (2023-07-13T05:03:26Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - FastClass: A Time-Efficient Approach to Weakly-Supervised Text
Classification [14.918600168973564]
This paper proposes FastClass, an efficient weakly-supervised classification approach.
It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus.
Experiments show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.
arXiv Detail & Related papers (2022-12-11T13:43:22Z) - LIME: Weakly-Supervised Text Classification Without Seeds [1.2691047660244335]
In weakly-supervised text classification, only label names act as sources of supervision.
We present LIME, a framework for weakly-supervised text classification.
We find that combining weakly-supervised classification and textual entailment mitigates shortcomings of both.
arXiv Detail & Related papers (2022-10-13T04:28:28Z) - GUDN A novel guide network for extreme multi-label text classification [12.975260278131078]
This paper constructs a novel guide network (GUDN) to help fine-tune the pre-trained model to instruct classification later.
We also use the raw label semantics to effectively explore the latent space between texts and labels, which can further improve predicted accuracy.
arXiv Detail & Related papers (2022-01-10T07:33:36Z) - Hierarchical Heterogeneous Graph Representation Learning for Short Text
Classification [60.233529926965836]
We propose a new method called SHINE, which is based on graph neural network (GNN) for short text classification.
First, we model the short text dataset as a hierarchical heterogeneous graph consisting of word-level component graphs.
Then, we dynamically learn a short document graph that facilitates effective label propagation among similar short texts.
arXiv Detail & Related papers (2021-10-30T05:33:05Z) - R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic
Matching [58.72111690643359]
We propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching.
We first employ BERT to encode the input sentences from a global perspective.
Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective.
To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task.
arXiv Detail & Related papers (2020-12-16T13:11:30Z) - TF-CR: Weighting Embeddings for Text Classification [6.531659195805749]
We introduce a novel weighting scheme, Term Frequency-Category Ratio (TF-CR), which can weight high-frequency, category-exclusive words higher when computing word embeddings.
Experiments on 16 classification datasets show the effectiveness of TF-CR, leading to improved performance scores over existing weighting schemes.
arXiv Detail & Related papers (2020-12-11T19:23:28Z) - Unsupervised Label Refinement Improves Dataless Text Classification [48.031421660674745]
Dataless text classification is capable of classifying documents into previously unseen labels by assigning a score to any document paired with a label description.
While promising, it crucially relies on accurate descriptions of the label set for each downstream task.
This reliance causes dataless classifiers to be highly sensitive to the choice of label descriptions and hinders the broader application of dataless classification in practice.
arXiv Detail & Related papers (2020-12-08T03:37:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.