A Neural Few-Shot Text Classification Reality Check
- URL: http://arxiv.org/abs/2101.12073v1
- Date: Thu, 28 Jan 2021 15:46:14 GMT
- Title: A Neural Few-Shot Text Classification Reality Check
- Authors: Thomas Dopierre, Christophe Gravier, Wilfried Logerais
- Abstract summary: Several neural few-shot classification models have emerged, yielding significant progress over time.
In this paper, we compare all these models, first adapting those made in the field of image processing to NLP, and second providing them access to transformers.
We then test these models equipped with the same transformer-based encoder on the intent detection task, known for having a large number of classes.
- Score: 4.689945062721168
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern classification models tend to struggle when the amount of annotated
data is scarce. To overcome this issue, several neural few-shot classification
models have emerged, yielding significant progress over time, both in Computer
Vision and Natural Language Processing. In the latter, such models used to rely
on fixed word embeddings before the advent of transformers. Additionally, some
models used in Computer Vision are yet to be tested in NLP applications. In
this paper, we compare all these models, first adapting those made in the field
of image processing to NLP, and second providing them access to transformers.
We then test these models equipped with the same transformer-based encoder on
the intent detection task, known for having a large number of classes. Our
results reveal that while methods perform almost equally on the ARSC dataset,
this is not the case for the Intent Detection task, where the most recent and
supposedly best competitors perform worse than older and simpler ones (while
all are given access to transformers). We also show that a simple baseline is
surprisingly strong. All the new developed models, as well as the evaluation
framework, are made publicly available.
Related papers
- Substance or Style: What Does Your Image Embedding Know? [55.676463077772866]
Image foundation models have primarily been evaluated for semantic content.
We measure the visual content of embeddings along many axes, including image style, quality, and a range of natural and artificial transformations.
We find that image-text models (CLIP and ALIGN) are better at recognizing new examples of style transfer than masking-based models (CAN and MAE)
arXiv Detail & Related papers (2023-07-10T22:40:10Z) - On Robustness of Finetuned Transformer-based NLP Models [11.063628128069736]
We characterize changes between pretrained and finetuned language model representations across layers using two metrics: CKA and STIR.
GPT-2 representations are more robust than BERT and T5 across multiple types of input perturbations.
This study provides valuable insights into perturbation-specific weaknesses of popular Transformer-based models.
arXiv Detail & Related papers (2023-05-23T18:25:18Z) - Finding Differences Between Transformers and ConvNets Using
Counterfactual Simulation Testing [82.67716657524251]
We present a counterfactual framework that allows us to study the robustness of neural networks with respect to naturalistic variations.
Our method allows for a fair comparison of the robustness of recently released, state-of-the-art Convolutional Neural Networks and Vision Transformers.
arXiv Detail & Related papers (2022-11-29T18:59:23Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - What do Toothbrushes do in the Kitchen? How Transformers Think our World
is Structured [137.83584233680116]
We investigate what extent transformer-based language models allow for extracting knowledge about object relations.
We show that the models combined with the different similarity measures differ greatly in terms of the amount of knowledge they allow for extracting.
Surprisingly, static models perform almost as well as contextualized models -- in some cases even better.
arXiv Detail & Related papers (2022-04-12T10:00:20Z) - Investigating Transfer Learning Capabilities of Vision Transformers and
CNNs by Fine-Tuning a Single Trainable Block [0.0]
transformer-based architectures are surpassing the state-of-the-art set by CNN architectures in accuracy but are computationally very expensive to train from scratch.
We study it's transfer learning capabilities and compare it with CNNs so that we can understand which architecture is better when applied to real world problems with small data.
We find out that transformers-based architectures not only achieve higher accuracy than CNNs but some transformers even achieve this feat with around 4 times lesser number of parameters.
arXiv Detail & Related papers (2021-10-11T13:43:03Z) - Few Shot Activity Recognition Using Variational Inference [9.371378627575883]
We propose a novel variational inference based architectural framework (HF-AR) for few shot activity recognition.
Our framework leverages volume-preserving Householder Flow to learn a flexible posterior distribution of the novel classes.
This results in better performance as compared to state-of-the-art few shot approaches for human activity recognition.
arXiv Detail & Related papers (2021-08-20T03:57:58Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z) - ViViT: A Video Vision Transformer [75.74690759089529]
We present pure-transformer based models for video classification.
Our model extracts-temporal tokens from the input video, which are then encoded by a series of transformer layers.
We show how we can effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets.
arXiv Detail & Related papers (2021-03-29T15:27:17Z) - Gradient-Based Adversarial Training on Transformer Networks for
Detecting Check-Worthy Factual Claims [3.7543966923106438]
We introduce the first adversarially-regularized, transformer-based claim spotter model.
We obtain a 4.70 point F1-score improvement over current state-of-the-art models.
We propose a method to apply adversarial training to transformer models.
arXiv Detail & Related papers (2020-02-18T16:51:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.