Many-Class Text Classification with Matching
- URL: http://arxiv.org/abs/2205.11409v1
- Date: Mon, 23 May 2022 15:51:19 GMT
- Title: Many-Class Text Classification with Matching
- Authors: Yi Song, Yuxian Gu, Minlie Huang
- Abstract summary: We formulate textbfText textbfClassification as a textbfMatching problem between the text and the labels, and propose a simple yet effective framework named TCM.
Compared with previous text classification approaches, TCM takes advantage of the fine-grained semantic information of the classification labels.
- Score: 65.74328417321738
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we formulate \textbf{T}ext \textbf{C}lassification as a
\textbf{M}atching problem between the text and the labels, and propose a simple
yet effective framework named TCM. Compared with previous text classification
approaches, TCM takes advantage of the fine-grained semantic information of the
classification labels, which helps distinguish each class better when the class
number is large, especially in low-resource scenarios. TCM is also easy to
implement and is compatible with various large pretrained language models. We
evaluate TCM on 4 text classification datasets (each with 20+ labels) in both
few-shot and full-data settings, and this model demonstrates significant
improvements over other text classification paradigms. We also conduct
extensive experiments with different variants of TCM and discuss the underlying
factors of its success. Our method and analyses offer a new perspective on text
classification.
Related papers
- Adaptable and Reliable Text Classification using Large Language Models [7.962669028039958]
This paper introduces an adaptable and reliable text classification paradigm, which leverages Large Language Models (LLMs)
We evaluated the performance of several LLMs, machine learning algorithms, and neural network-based architectures on four diverse datasets.
It is shown that the system's performance can be further enhanced through few-shot or fine-tuning strategies.
arXiv Detail & Related papers (2024-05-17T04:05:05Z) - Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery [50.564146730579424]
We propose a Text Embedding Synthesizer (TES) to generate pseudo text embeddings for unlabelled samples.
Our method unlocks the multi-modal potentials of CLIP and outperforms the baseline methods by a large margin on all GCD benchmarks.
arXiv Detail & Related papers (2024-03-15T02:40:13Z) - BERT Goes Off-Topic: Investigating the Domain Transfer Challenge using
Genre Classification [0.27195102129095]
We show that classification tasks still suffer from a performance gap when the underlying distribution of topics changes.
We quantify this phenomenon empirically with a large corpus and a large set of topics.
We suggest and successfully test a possible remedy: after augmenting the training dataset with topically-controlled synthetic texts, the F1 score improves by up to 50% for some topics.
arXiv Detail & Related papers (2023-11-27T18:53:31Z) - Like a Good Nearest Neighbor: Practical Content Moderation and Text
Classification [66.02091763340094]
Like a Good Nearest Neighbor (LaGoNN) is a modification to SetFit that introduces no learnable parameters but alters input text with information from its nearest neighbor.
LaGoNN is effective at flagging undesirable content and text classification, and improves the performance of SetFit.
arXiv Detail & Related papers (2023-02-17T15:43:29Z) - Evaluating Unsupervised Text Classification: Zero-shot and
Similarity-based Approaches [0.6767885381740952]
Similarity-based approaches attempt to classify instances based on similarities between text document representations and class description representations.
Zero-shot text classification approaches aim to generalize knowledge gained from a training task by assigning appropriate labels of unknown classes to text documents.
This paper conducts a systematic evaluation of different similarity-based and zero-shot approaches for text classification of unseen classes.
arXiv Detail & Related papers (2022-11-29T15:14:47Z) - Selective Text Augmentation with Word Roles for Low-Resource Text
Classification [3.4806267677524896]
Different words may play different roles in text classification, which inspires us to strategically select the proper roles for text augmentation.
In this work, we first identify the relationships between the words in a text and the text category from the perspectives of statistical correlation and semantic similarity.
We present a new augmentation technique called STA (Selective Text Augmentation) where different text-editing operations are selectively applied to words with specific roles.
arXiv Detail & Related papers (2022-09-04T08:13:11Z) - Hierarchical Heterogeneous Graph Representation Learning for Short Text
Classification [60.233529926965836]
We propose a new method called SHINE, which is based on graph neural network (GNN) for short text classification.
First, we model the short text dataset as a hierarchical heterogeneous graph consisting of word-level component graphs.
Then, we dynamically learn a short document graph that facilitates effective label propagation among similar short texts.
arXiv Detail & Related papers (2021-10-30T05:33:05Z) - Minimally-Supervised Structure-Rich Text Categorization via Learning on
Text-Rich Networks [61.23408995934415]
We propose a novel framework for minimally supervised categorization by learning from the text-rich network.
Specifically, we jointly train two modules with different inductive biases -- a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning.
Our experiments show that given only three seed documents per category, our framework can achieve an accuracy of about 92%.
arXiv Detail & Related papers (2021-02-23T04:14:34Z) - MATCH: Metadata-Aware Text Classification in A Large Hierarchy [60.59183151617578]
MATCH is an end-to-end framework that leverages both metadata and hierarchy information.
We propose different ways to regularize the parameters and output probability of each child label by its parents.
Experiments on two massive text datasets with large-scale label hierarchies demonstrate the effectiveness of MATCH.
arXiv Detail & Related papers (2021-02-15T05:23:08Z) - MultiGBS: A multi-layer graph approach to biomedical summarization [6.11737116137921]
We propose a domain-specific method that models a document as a multi-layer graph to enable multiple features of the text to be processed at the same time.
The unsupervised method selects sentences from the multi-layer graph based on the MultiRank algorithm and the number of concepts.
The proposed MultiGBS algorithm employs UMLS and extracts the concepts and relationships using different tools such as SemRep, MetaMap, and OGER.
arXiv Detail & Related papers (2020-08-27T04:22:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.