Text Classification with Few Examples using Controlled Generalization
- URL: http://arxiv.org/abs/2005.08469v1
- Date: Mon, 18 May 2020 06:04:58 GMT
- Title: Text Classification with Few Examples using Controlled Generalization
- Authors: Abhijit Mahabal, Jason Baldridge, Burcu Karagol Ayan, Vincent Perot,
Dan Roth
- Abstract summary: Current practice relies on pre-trained word embeddings to map words unseen in training to similar seen ones.
Our alternative begins with sparse pre-trained representations derived from unlabeled parsed corpora.
We show that a feed-forward network over these vectors is especially effective in low-data scenarios.
- Score: 58.971750512415134
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training data for text classification is often limited in practice,
especially for applications with many output classes or involving many related
classification problems. This means classifiers must generalize from limited
evidence, but the manner and extent of generalization is task dependent.
Current practice primarily relies on pre-trained word embeddings to map words
unseen in training to similar seen ones. Unfortunately, this squishes many
components of meaning into highly restricted capacity. Our alternative begins
with sparse pre-trained representations derived from unlabeled parsed corpora;
based on the available training data, we select features that offers the
relevant generalizations. This produces task-specific semantic vectors; here,
we show that a feed-forward network over these vectors is especially effective
in low-data scenarios, compared to existing state-of-the-art methods. By
further pairing this network with a convolutional neural network, we keep this
edge in low data scenarios and remain competitive when using full training
sets.
Related papers
- Why Fine-grained Labels in Pretraining Benefit Generalization? [12.171634061370616]
Recent studies show that pretraining a deep neural network with fine-grained labeled data, followed by fine-tuning on coarse-labeled data, often yields better generalization than pretraining with coarse-labeled data.
This paper addresses this gap by introducing a "hierarchical multi-view" structure to confine the input data distribution.
Under this framework, we prove that: 1) coarse-grained pretraining only allows a neural network to learn the common features well, while 2) fine-grained pretraining helps the network learn the rare features in addition to the common ones, leading to improved accuracy on hard downstream test samples.
arXiv Detail & Related papers (2024-10-30T15:41:30Z) - Manual Verbalizer Enrichment for Few-Shot Text Classification [1.860409237919611]
acrshortmave is an approach for verbalizer construction by enrichment of class labels.
Our model achieves state-of-the-art results while using significantly fewer resources.
arXiv Detail & Related papers (2024-10-08T16:16:47Z) - CCPrefix: Counterfactual Contrastive Prefix-Tuning for Many-Class
Classification [57.62886091828512]
We propose a brand-new prefix-tuning method, Counterfactual Contrastive Prefix-tuning (CCPrefix) for many-class classification.
Basically, an instance-dependent soft prefix, derived from fact-counterfactual pairs in the label space, is leveraged to complement the language verbalizers in many-class classification.
arXiv Detail & Related papers (2022-11-11T03:45:59Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - Semi-Supervised Learning using Siamese Networks [3.492636597449942]
This work explores a new training method for semi-supervised learning that is based on similarity function learning using a Siamese network.
Confident predictions of unlabeled instances are used as true labels for retraining the Siamese network.
For improving unlabeled predictions, local learning with global consistency is also evaluated.
arXiv Detail & Related papers (2021-09-02T09:06:35Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - TF-CR: Weighting Embeddings for Text Classification [6.531659195805749]
We introduce a novel weighting scheme, Term Frequency-Category Ratio (TF-CR), which can weight high-frequency, category-exclusive words higher when computing word embeddings.
Experiments on 16 classification datasets show the effectiveness of TF-CR, leading to improved performance scores over existing weighting schemes.
arXiv Detail & Related papers (2020-12-11T19:23:28Z) - ALICE: Active Learning with Contrastive Natural Language Explanations [69.03658685761538]
We propose Active Learning with Contrastive Explanations (ALICE) to improve data efficiency in learning.
ALICE learns to first use active learning to select the most informative pairs of label classes to elicit contrastive natural language explanations.
It extracts knowledge from these explanations using a semantically extracted knowledge.
arXiv Detail & Related papers (2020-09-22T01:02:07Z) - AL2: Progressive Activation Loss for Learning General Representations in
Classification Neural Networks [12.14537824884951]
We propose a novel regularization method that progressively penalizes the magnitude of activations during training.
Our method's effect on generalization is analyzed with label randomization tests and cumulative ablations.
arXiv Detail & Related papers (2020-03-07T18:38:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.