Related papers: TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision

TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision

URL: http://arxiv.org/abs/2403.00165v2
Date: Sun, 16 Jun 2024 19:10:39 GMT
Title: TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision
Authors: Yunyi Zhang, Ruozhen Yang, Xueqiang Xu, Rui Li, Jinfeng Xiao, Jiaming Shen, Jiawei Han,
Abstract summary: Hierarchical text classification aims to categorize each document into a set of classes in a label taxonomy. Most earlier works focus on fully or semi-supervised methods that require a large amount of human annotated data. We work on hierarchical text classification with the minimal amount of supervision: using the sole class name of each node as the only supervision.
Score: 41.05874642535256
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hierarchical text classification aims to categorize each document into a set of classes in a label taxonomy. Most earlier works focus on fully or semi-supervised methods that require a large amount of human annotated data which is costly and time-consuming to acquire. To alleviate human efforts, in this paper, we work on hierarchical text classification with the minimal amount of supervision: using the sole class name of each node as the only supervision. Recently, large language models (LLM) show competitive performance on various tasks through zero-shot prompting, but this method performs poorly in the hierarchical setting, because it is ineffective to include the large and structured label space in a prompt. On the other hand, previous weakly-supervised hierarchical text classification methods only utilize the raw taxonomy skeleton and ignore the rich information hidden in the text corpus that can serve as additional class-indicative features. To tackle the above challenges, we propose TELEClass, Taxonomy Enrichment and LLM-Enhanced weakly-supervised hierarchical text Classification, which (1) automatically enriches the label taxonomy with class-indicative terms to facilitate classifier training and (2) utilizes LLMs for both data annotation and creation tailored for the hierarchical label space. Experiments show that TELEClass can outperform previous weakly-supervised methods and LLM-based zero-shot prompting methods on two public datasets.

Related papers

Open-world Multi-label Text Classification with Extremely Weak Supervision [30.85235057480158]
We study open-world multi-label text classification under extremely weak supervision (XWS) We first utilize the user description to prompt a large language model (LLM) for dominant keyphrases of a subset of raw documents, and then construct a label space via clustering. We then apply a zero-shot multi-label classifier to locate the documents with small top predicted scores, so we can revisit their dominant keyphrases for more long-tail labels. X-MLClass exhibits a remarkable increase in ground-truth label space coverage on various datasets.
arXiv Detail & Related papers (2024-07-08T04:52:49Z)
RulePrompt: Weakly Supervised Text Classification with Prompting PLMs and Self-Iterative Logical Rules [30.239044569301534]
Weakly supervised text classification (WSTC) has attracted increasing attention due to its applicability in classifying a mass of texts. We propose a prompting PLM-based approach named RulePrompt for the WSTC task, consisting of a rule mining module and a rule-enhanced pseudo label generation module. Our approach yields interpretable category rules, proving its advantage in disambiguating easily-confused categories.
arXiv Detail & Related papers (2024-03-05T12:50:36Z)
HiGen: Hierarchy-Aware Sequence Generation for Hierarchical Text Classification [19.12354692458442]
Hierarchical text classification (HTC) is a complex subtask under multi-label text classification. We propose HiGen, a text-generation-based framework utilizing language models to encode dynamic text representations.
arXiv Detail & Related papers (2024-01-24T04:44:42Z)
XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak Supervision [6.406111099707549]
XAI-CLASS is a novel explanation-enhanced weakly-supervised text classification method. It incorporates word saliency prediction as an auxiliary task. XAI-CLASS outperforms other weakly-supervised text classification methods significantly.
arXiv Detail & Related papers (2023-10-31T23:24:22Z)
Instance-level Few-shot Learning with Class Hierarchy Mining [26.273796311012042]
We exploit hierarchical information to leverage discriminative and relevant features of base classes to effectively classify novel objects. These features are extracted from abundant data of base classes, which could be utilized to reasonably describe classes with scarce data. In order to effectively train the hierarchy-based-detector in FSIS, we apply the label refinement to further describe the associations between fine-grained classes.
arXiv Detail & Related papers (2023-04-15T02:55:08Z)
Many-Class Text Classification with Matching [65.74328417321738]
We formulate textbfText textbfClassification as a textbfMatching problem between the text and the labels, and propose a simple yet effective framework named TCM. Compared with previous text classification approaches, TCM takes advantage of the fine-grained semantic information of the classification labels.
arXiv Detail & Related papers (2022-05-23T15:51:19Z)
Label Semantic Aware Pre-training for Few-shot Text Classification [53.80908620663974]
We propose Label Semantic Aware Pre-training (LSAP) to improve the generalization and data efficiency of text classification systems. LSAP incorporates label semantics into pre-trained generative models (T5 in our case) by performing secondary pre-training on labeled sentences from a variety of domains.
arXiv Detail & Related papers (2022-04-14T17:33:34Z)
MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information [47.44278057062421]
We study the problem of weakly supervised text classification, which aims to classify text documents into a set of pre-defined categories with category surface names only. To be specific, we model the relationships between documents and metadata via a heterogeneous information network. We propose a novel framework, named MotifClass, which selects category-indicative motif instances, retrieves and generates pseudo-labeled training samples based on category names and indicative motif instances.
arXiv Detail & Related papers (2021-11-07T07:39:10Z)
Minimally-Supervised Structure-Rich Text Categorization via Learning on Text-Rich Networks [61.23408995934415]
We propose a novel framework for minimally supervised categorization by learning from the text-rich network. Specifically, we jointly train two modules with different inductive biases -- a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning. Our experiments show that given only three seed documents per category, our framework can achieve an accuracy of about 92%.
arXiv Detail & Related papers (2021-02-23T04:14:34Z)
MATCH: Metadata-Aware Text Classification in A Large Hierarchy [60.59183151617578]
MATCH is an end-to-end framework that leverages both metadata and hierarchy information. We propose different ways to regularize the parameters and output probability of each child label by its parents. Experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH.
arXiv Detail & Related papers (2021-02-15T05:23:08Z)
Exploring the Hierarchy in Relation Labels for Scene Graph Generation [75.88758055269948]
The proposed method can improve several state-of-the-art baselines by a large margin (up to $33%$ relative gain) in terms of Recall@50. Experiments show that the proposed simple yet effective method can improve several state-of-the-art baselines by a large margin.
arXiv Detail & Related papers (2020-09-12T17:36:53Z)
Efficient strategies for hierarchical text classification: External knowledge and auxiliary tasks [3.5557219875516655]
We perform a sequence of inference steps to predict the category of a document from top to bottom of a given class taxonomy. With our efficient approaches, we outperform previous studies, using a drastically reduced number of parameters, in two well-known English datasets.
arXiv Detail & Related papers (2020-05-05T20:22:18Z)
Description Based Text Classification with Reinforcement Learning [34.18824470728299]
We propose a new framework for text classification, in which each category label is associated with a category description. We observe significant performance boosts over strong baselines on a wide range of text classification tasks.
arXiv Detail & Related papers (2020-02-08T02:14:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.