Beyond prompting: Making Pre-trained Language Models Better Zero-shot
Learners by Clustering Representations
- URL: http://arxiv.org/abs/2210.16637v1
- Date: Sat, 29 Oct 2022 16:01:51 GMT
- Title: Beyond prompting: Making Pre-trained Language Models Better Zero-shot
Learners by Clustering Representations
- Authors: Yu Fei, Ping Nie, Zhao Meng, Roger Wattenhofer, Mrinmaya Sachan
- Abstract summary: We show that zero-shot text classification can be improved simply by clustering texts in the embedding spaces of pre-trained language models.
Our approach achieves an average of 20% absolute improvement over prompt-based zero-shot learning.
- Score: 24.3378487252621
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has demonstrated that pre-trained language models (PLMs) are
zero-shot learners. However, most existing zero-shot methods involve heavy
human engineering or complicated self-training pipelines, hindering their
application to new situations. In this work, we show that zero-shot text
classification can be improved simply by clustering texts in the embedding
spaces of PLMs. Specifically, we fit the unlabeled texts with a Bayesian
Gaussian Mixture Model after initializing cluster positions and shapes using
class names. Despite its simplicity, this approach achieves superior or
comparable performance on both topic and sentiment classification datasets and
outperforms prior works significantly on unbalanced datasets. We further
explore the applicability of our clustering approach by evaluating it on 14
datasets with more diverse topics, text lengths, and numbers of classes. Our
approach achieves an average of 20% absolute improvement over prompt-based
zero-shot learning. Finally, we compare different PLM embedding spaces and find
that texts are well-clustered by topics even if the PLM is not explicitly
pre-trained to generate meaningful sentence embeddings. This work indicates
that PLM embeddings can categorize texts without task-specific fine-tuning,
thus providing a new way to analyze and utilize their knowledge and zero-shot
learning ability.
Related papers
- Text Clustering as Classification with LLMs [6.030435811868953]
This study presents a novel framework for text clustering that effectively leverages the in-context learning capacity of Large Language Models (LLMs)
Instead of fine-tuning embedders, we propose to transform the text clustering into a classification task via LLM.
Our framework has been experimentally proven to achieve comparable or superior performance to state-of-the-art clustering methods.
arXiv Detail & Related papers (2024-09-30T16:57:34Z) - Learning to Prompt with Text Only Supervision for Vision-Language Models [107.282881515667]
One branch of methods adapts CLIP by learning prompts using visual information.
An alternative approach resorts to training-free methods by generating class descriptions from large language models.
We propose to combine the strengths of both streams by learning prompts using only text data.
arXiv Detail & Related papers (2024-01-04T18:59:49Z) - Large Language Models Are Zero-Shot Text Classifiers [3.617781755808837]
Large language models (LLMs) have become extensively used across various sub-disciplines of natural language processing (NLP)
In NLP, text classification problems have garnered considerable focus, but still faced with some limitations related to expensive computational cost, time consumption, and robust performance to unseen classes.
With the proposal of chain of thought prompting (CoT), LLMs can be implemented using zero-shot learning (ZSL) with the step by step reasoning prompts.
arXiv Detail & Related papers (2023-12-02T06:33:23Z) - BYOC: Personalized Few-Shot Classification with Co-Authored Class
Descriptions [2.076173115539025]
We propose a novel approach to few-shot text classification using an LLM.
Rather than few-shot examples, the LLM is prompted with descriptions of the salient features of each class.
Examples, questions, and answers are summarized to form the classification prompt.
arXiv Detail & Related papers (2023-10-09T19:37:38Z) - Towards Realistic Zero-Shot Classification via Self Structural Semantic
Alignment [53.2701026843921]
Large-scale pre-trained Vision Language Models (VLMs) have proven effective for zero-shot classification.
In this paper, we aim at a more challenging setting, Realistic Zero-Shot Classification, which assumes no annotation but instead a broad vocabulary.
We propose the Self Structural Semantic Alignment (S3A) framework, which extracts structural semantic information from unlabeled data while simultaneously self-learning.
arXiv Detail & Related papers (2023-08-24T17:56:46Z) - Zero-Shot Text Classification with Self-Training [8.68603153534916]
We show that fine-tuning the zero-shot classifier on its most confident predictions leads to significant performance gains across a wide range of text classification tasks.
Self-training adapts the zero-shot model to the task at hand.
arXiv Detail & Related papers (2022-10-31T17:55:00Z) - ZeroGen$^+$: Self-Guided High-Quality Data Generation in Efficient
Zero-Shot Learning [97.2907428983142]
ZeroGen attempts to purely use PLM to generate data and train a tiny model without relying on task-specific annotation.
We propose a noise-robust bi-level re-weighting framework which is able to learn the per-sample weights measuring the data quality without requiring any gold data.
arXiv Detail & Related papers (2022-05-25T11:38:48Z) - Progressive Class Semantic Matching for Semi-supervised Text
Classification [26.794533973357403]
We investigate the marriage between semi-supervised learning and a pre-trained language model.
By means of extensive experiments, we show that our method can bring remarkable improvement to baselines.
arXiv Detail & Related papers (2022-05-20T13:59:03Z) - ZeroBERTo -- Leveraging Zero-Shot Text Classification by Topic Modeling [57.80052276304937]
This paper proposes a new model, ZeroBERTo, which leverages an unsupervised clustering step to obtain a compressed data representation before the classification task.
We show that ZeroBERTo has better performance for long inputs and shorter execution time, outperforming XLM-R by about 12% in the F1 score in the FolhaUOL dataset.
arXiv Detail & Related papers (2022-01-04T20:08:17Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action
Recognition [52.66360172784038]
We propose a clustering-based model, which considers all training samples at once, instead of optimizing for each instance individually.
We call the proposed method CLASTER and observe that it consistently improves over the state-of-the-art in all standard datasets.
arXiv Detail & Related papers (2021-01-18T12:46:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.