KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation
- URL: http://arxiv.org/abs/2307.01878v2
- Date: Sun, 11 Feb 2024 07:08:09 GMT
- Title: KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation
- Authors: Weijie Xu, Xiaoyu Jiang, Jay Desai, Bin Han, Fuqin Yan and Francis
Iannacci
- Abstract summary: In text classification tasks, fine tuning pretrained language models like BERT and GPT-3 yields competitive accuracy.
General topic modeling methods possess the advantage of analyzing documents to extract meaningful patterns of words without the need of pretraining.
We develop the Knowledge Distillation Semi-supervised Topic Modeling (KDSTM) to leverage topic modeling's unsupervised insights extraction on text classification tasks.
- Score: 5.688430564294212
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In text classification tasks, fine tuning pretrained language models like
BERT and GPT-3 yields competitive accuracy; however, both methods require
pretraining on large text datasets. In contrast, general topic modeling methods
possess the advantage of analyzing documents to extract meaningful patterns of
words without the need of pretraining. To leverage topic modeling's
unsupervised insights extraction on text classification tasks, we develop the
Knowledge Distillation Semi-supervised Topic Modeling (KDSTM). KDSTM requires
no pretrained embeddings, few labeled documents and is efficient to train,
making it ideal under resource constrained settings. Across a variety of
datasets, our method outperforms existing supervised topic modeling methods in
classification accuracy, robustness and efficiency and achieves similar
performance compare to state of the art weakly supervised text classification
methods.
Related papers
- Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes.
Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts.
We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z) - CELA: Cost-Efficient Language Model Alignment for CTR Prediction [71.85120354973073]
Click-Through Rate (CTR) prediction holds a paramount position in recommender systems.
Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs)
We propose textbfCost-textbfEfficient textbfLanguage Model textbfAlignment (textbfCELA) for CTR prediction.
arXiv Detail & Related papers (2024-05-17T07:43:25Z) - Test-Time Training on Graphs with Large Language Models (LLMs) [68.375487369596]
Test-Time Training (TTT) has been proposed as a promising approach to train Graph Neural Networks (GNNs)
Inspired by the great annotation ability of Large Language Models (LLMs) on Text-Attributed Graphs (TAGs), we propose to enhance the test-time training on graphs with LLMs as annotators.
A two-stage training strategy is designed to tailor the test-time model with the limited and noisy labels.
arXiv Detail & Related papers (2024-04-21T08:20:02Z) - Self-Supervised Representation Learning for Online Handwriting Text
Classification [0.8594140167290099]
We propose the novel Part of Stroke Masking (POSM) as a pretext task for pretraining models to extract informative representations from the online handwriting of individuals in English and Chinese languages.
To evaluate the quality of the extracted representations, we use both intrinsic and extrinsic evaluation methods.
The pretrained models are fine-tuned to achieve state-of-the-art results in tasks such as writer identification, gender classification, and handedness classification.
arXiv Detail & Related papers (2023-10-10T14:07:49Z) - Attention is Not Always What You Need: Towards Efficient Classification
of Domain-Specific Text [1.1508304497344637]
For large-scale IT corpora with hundreds of classes organized in a hierarchy, the task of accurate classification of classes at the higher level in the hierarchies is crucial.
In the business world, an efficient and explainable ML model is preferred over an expensive black-box model, especially if the performance increase is marginal.
Despite the widespread use of PLMs, there is a lack of a clear and well-justified need to as why these models are being employed for domain-specific text classification.
arXiv Detail & Related papers (2023-03-31T03:17:23Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Is BERT a Cross-Disciplinary Knowledge Learner? A Surprising Finding of
Pre-trained Models' Transferability [74.11825654535895]
We investigate whether the power of the models pre-trained on text data, such as BERT, can be transferred to general token sequence classification applications.
We find that even on non-text data, the models pre-trained on text converge faster than the randomly models.
arXiv Detail & Related papers (2021-03-12T09:19:14Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Learning Variational Word Masks to Improve the Interpretability of
Neural Text Classifiers [21.594361495948316]
A new line of work on improving model interpretability has just started, and many existing methods require either prior information or human annotations as additional inputs in training.
We propose the variational word mask (VMASK) method to automatically learn task-specific important words and reduce irrelevant information on classification, which ultimately improves the interpretability of model predictions.
arXiv Detail & Related papers (2020-10-01T20:02:43Z) - Reinforced Curriculum Learning on Pre-trained Neural Machine Translation
Models [20.976165305749777]
We learn a curriculum for improving a pre-trained NMT model by re-selecting influential data samples from the original training set.
We propose a data selection framework based on Deterministic Actor-Critic, in which a critic network predicts the expected change of model performance.
arXiv Detail & Related papers (2020-04-13T03:40:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.