Related papers: Weakly-supervised Text Classification Based on Keyword Graph

Weakly-supervised Text Classification Based on Keyword Graph

URL: http://arxiv.org/abs/2110.02591v1
Date: Wed, 6 Oct 2021 08:58:02 GMT
Title: Weakly-supervised Text Classification Based on Keyword Graph
Authors: Lu Zhang, Jiandong Ding, Yi Xu, Yingyao Liu and Shuigeng Zhou
Abstract summary: We propose a novel framework called ClassKG to explore keyword-keyword correlation on keyword graph by GNN. Our framework is an iterative process. In each iteration, we first construct a keyword graph, so the task of assigning pseudo labels is transformed to annotating keyword subgraphs. With the pseudo labels generated by the subgraph annotator, we then train a text classifier to classify the unlabeled texts.
Score: 30.57722085686241
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Weakly-supervised text classification has received much attention in recent years for it can alleviate the heavy burden of annotating massive data. Among them, keyword-driven methods are the mainstream where user-provided keywords are exploited to generate pseudo-labels for unlabeled texts. However, existing methods treat keywords independently, thus ignore the correlation among them, which should be useful if properly exploited. In this paper, we propose a novel framework called ClassKG to explore keyword-keyword correlation on keyword graph by GNN. Our framework is an iterative process. In each iteration, we first construct a keyword graph, so the task of assigning pseudo labels is transformed to annotating keyword subgraphs. To improve the annotation quality, we introduce a self-supervised task to pretrain a subgraph annotator, and then finetune it. With the pseudo labels generated by the subgraph annotator, we then train a text classifier to classify the unlabeled texts. Finally, we re-extract keywords from the classified texts. Extensive experiments on both long-text and short-text datasets show that our method substantially outperforms the existing ones

Related papers

Graph-based Retrieval Augmented Generation for Dynamic Few-shot Text Classification [15.0627807767152]
We propose a graph-based online retrieval-augmented generation framework, namely GORAG, for dynamic few-shot text classification. GORAG constructs and maintains a weighted graph by extracting side information across all target texts. Empirical evaluations demonstrate that GORAG outperforms existing approaches by providing more comprehensive and precise contextual information.
arXiv Detail & Related papers (2025-01-06T08:43:31Z)
Scribbles for All: Benchmarking Scribble Supervised Segmentation Across Datasets [51.74296438621836]
We introduce Scribbles for All, a label and training data generation algorithm for semantic segmentation trained on scribble labels. The main limitation of scribbles as source for weak supervision is the lack of challenging datasets for scribble segmentation. Scribbles for All provides scribble labels for several popular segmentation datasets and provides an algorithm to automatically generate scribble labels for any dataset with dense annotations.
arXiv Detail & Related papers (2024-08-22T15:29:08Z)
Copy Is All You Need [66.00852205068327]
We formulate text generation as progressively copying text segments from an existing text collection. Our approach achieves better generation quality according to both automatic and human evaluations. Our approach attains additional performance gains by simply scaling up to larger text collections.
arXiv Detail & Related papers (2023-07-13T05:03:26Z)
Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging. Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations. We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z)
FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification [14.918600168973564]
This paper proposes FastClass, an efficient weakly-supervised classification approach. It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus. Experiments show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.
arXiv Detail & Related papers (2022-12-11T13:43:22Z)
LIME: Weakly-Supervised Text Classification Without Seeds [1.2691047660244335]
In weakly-supervised text classification, only label names act as sources of supervision. We present LIME, a framework for weakly-supervised text classification. We find that combining weakly-supervised classification and textual entailment mitigates shortcomings of both.
arXiv Detail & Related papers (2022-10-13T04:28:28Z)
GUDN A novel guide network for extreme multi-label text classification [12.975260278131078]
This paper constructs a novel guide network (GUDN) to help fine-tune the pre-trained model to instruct classification later. We also use the raw label semantics to effectively explore the latent space between texts and labels, which can further improve predicted accuracy.
arXiv Detail & Related papers (2022-01-10T07:33:36Z)
Hierarchical Heterogeneous Graph Representation Learning for Short Text Classification [60.233529926965836]
We propose a new method called SHINE, which is based on graph neural network (GNN) for short text classification. First, we model the short text dataset as a hierarchical heterogeneous graph consisting of word-level component graphs. Then, we dynamically learn a short document graph that facilitates effective label propagation among similar short texts.
arXiv Detail & Related papers (2021-10-30T05:33:05Z)
R$^2$-Net: Relation of Relation Learning Network for Sentence Semantic Matching [58.72111690643359]
We propose a Relation of Relation Learning Network (R2-Net) for sentence semantic matching. We first employ BERT to encode the input sentences from a global perspective. Then a CNN-based encoder is designed to capture keywords and phrase information from a local perspective. To fully leverage labels for better relation information extraction, we introduce a self-supervised relation of relation classification task.
arXiv Detail & Related papers (2020-12-16T13:11:30Z)
TF-CR: Weighting Embeddings for Text Classification [6.531659195805749]
We introduce a novel weighting scheme, Term Frequency-Category Ratio (TF-CR), which can weight high-frequency, category-exclusive words higher when computing word embeddings. Experiments on 16 classification datasets show the effectiveness of TF-CR, leading to improved performance scores over existing weighting schemes.
arXiv Detail & Related papers (2020-12-11T19:23:28Z)
Unsupervised Label Refinement Improves Dataless Text Classification [48.031421660674745]
Dataless text classification is capable of classifying documents into previously unseen labels by assigning a score to any document paired with a label description. While promising, it crucially relies on accurate descriptions of the label set for each downstream task. This reliance causes dataless classifiers to be highly sensitive to the choice of label descriptions and hinders the broader application of dataless classification in practice.
arXiv Detail & Related papers (2020-12-08T03:37:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.