Mask-guided BERT for Few Shot Text Classification
- URL: http://arxiv.org/abs/2302.10447v1
- Date: Tue, 21 Feb 2023 05:24:00 GMT
- Title: Mask-guided BERT for Few Shot Text Classification
- Authors: Wenxiong Liao, Zhengliang Liu, Haixing Dai, Zihao Wu, Yiyang Zhang,
Xiaoke Huang, Yuzhong Chen, Xi Jiang, Dajiang Zhu, Tianming Liu, Sheng Li,
Xiang Li, Hongmin Cai
- Abstract summary: Mask-BERT is a simple and modular framework to help BERT-based architectures tackle few-shot learning.
The core idea is to selectively apply masks on text inputs and filter out irrelevant information, which guides the model to focus on discriminative tokens.
Experimental results on public-domain benchmark datasets demonstrate the effectiveness of Mask-BERT.
- Score: 12.361032727044547
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer-based language models have achieved significant success in
various domains. However, the data-intensive nature of the transformer
architecture requires much labeled data, which is challenging in low-resource
scenarios (i.e., few-shot learning (FSL)). The main challenge of FSL is the
difficulty of training robust models on small amounts of samples, which
frequently leads to overfitting. Here we present Mask-BERT, a simple and
modular framework to help BERT-based architectures tackle FSL. The proposed
approach fundamentally differs from existing FSL strategies such as prompt
tuning and meta-learning. The core idea is to selectively apply masks on text
inputs and filter out irrelevant information, which guides the model to focus
on discriminative tokens that influence prediction results. In addition, to
make the text representations from different categories more separable and the
text representations from the same category more compact, we introduce a
contrastive learning loss function. Experimental results on public-domain
benchmark datasets demonstrate the effectiveness of Mask-BERT.
Related papers
- FLIP: Towards Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models (FLIP) for click-through rate (CTR) prediction.
Specifically, the masked data of one modality (i.e., tokens or features) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment.
Experiments on three real-world datasets demonstrate that FLIP outperforms SOTA baselines, and is highly compatible for various ID-based models and PLMs.
arXiv Detail & Related papers (2023-10-30T11:25:03Z) - Masked and Permuted Implicit Context Learning for Scene Text Recognition [8.742571493814326]
Scene Recognition (STR) is difficult because of variations in text styles, shapes, and backgrounds.
We propose a masked and permuted implicit context learning network for STR, within a single decoder.
arXiv Detail & Related papers (2023-05-25T15:31:02Z) - Distinguishability Calibration to In-Context Learning [31.375797763897104]
We propose a method to map a PLM-encoded embedding into a new metric space to guarantee the distinguishability of the resulting embeddings.
We also take the advantage of hyperbolic embeddings to capture the hierarchical relations among fine-grained class-associated token embedding.
arXiv Detail & Related papers (2023-02-13T09:15:00Z) - Exploring Efficient Few-shot Adaptation for Vision Transformers [70.91692521825405]
We propose a novel efficient Transformer Tuning (eTT) method that facilitates finetuning ViTs in the Few-shot Learning tasks.
Key novelties come from the newly presented Attentive Prefix Tuning (APT) and Domain Residual Adapter (DRA)
We conduct extensive experiments to show the efficacy of our model.
arXiv Detail & Related papers (2023-01-06T08:42:05Z) - Masked Autoencoding for Scalable and Generalizable Decision Making [93.84855114717062]
MaskDP is a simple and scalable self-supervised pretraining method for reinforcement learning and behavioral cloning.
We find that a MaskDP model gains the capability of zero-shot transfer to new BC tasks, such as single and multiple goal reaching.
arXiv Detail & Related papers (2022-11-23T07:04:41Z) - MGIMN: Multi-Grained Interactive Matching Network for Few-shot Text
Classification [9.9875634964736]
Text classification struggles to generalize to unseen classes with very few labeled text instances per class.
We propose a meta-learning based method MGIMN which performs instance-wise comparison followed by aggregation to generate class-wise matching vectors.
arXiv Detail & Related papers (2022-04-11T08:58:55Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - Data Efficient Masked Language Modeling for Vision and Language [16.95631509102115]
Masked language modeling (MLM) is one of the key sub-tasks in vision-language training.
In the cross-modal setting, tokens in the sentence are masked at random, and the model predicts the masked tokens given the image and the text.
We investigate a range of alternative masking strategies specific to the cross-modal setting that address these shortcomings.
arXiv Detail & Related papers (2021-09-05T11:27:53Z) - The Role of Global Labels in Few-Shot Classification and How to Infer
Them [55.64429518100676]
Few-shot learning is a central problem in meta-learning, where learners must quickly adapt to new tasks.
We propose Meta Label Learning (MeLa), a novel algorithm that infers global labels and obtains robust few-shot models via standard classification.
arXiv Detail & Related papers (2021-08-09T14:07:46Z) - TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot
classification [50.358839666165764]
We show that the Task-Adaptive Feature Sub-Space Learning (TAFSSL) can significantly boost the performance in Few-Shot Learning scenarios.
Specifically, we show that on the challenging miniImageNet and tieredImageNet benchmarks, TAFSSL can improve the current state-of-the-art in both transductive and semi-supervised FSL settings by more than $5%$.
arXiv Detail & Related papers (2020-03-14T16:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.