Contextual Embeddings: When Are They Worth It?
- URL: http://arxiv.org/abs/2005.09117v1
- Date: Mon, 18 May 2020 22:20:17 GMT
- Title: Contextual Embeddings: When Are They Worth It?
- Authors: Simran Arora, Avner May, Jian Zhang, Christopher R\'e
- Abstract summary: We study the settings for which deep contextual embeddings give large improvements in performance relative to classic pretrained embeddings.
We find that both of these simpler baselines can match contextual embeddings on industry-scale data.
We identify properties of data for which contextual embeddings give particularly large gains: language containing complex structure, ambiguous word usage, and words unseen in training.
- Score: 14.582968294755794
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the settings for which deep contextual embeddings (e.g., BERT) give
large improvements in performance relative to classic pretrained embeddings
(e.g., GloVe), and an even simpler baseline---random word embeddings---focusing
on the impact of the training set size and the linguistic properties of the
task. Surprisingly, we find that both of these simpler baselines can match
contextual embeddings on industry-scale data, and often perform within 5 to 10%
accuracy (absolute) on benchmark tasks. Furthermore, we identify properties of
data for which contextual embeddings give particularly large gains: language
containing complex structure, ambiguous word usage, and words unseen in
training.
Related papers
- Putting Context in Context: the Impact of Discussion Structure on Text
Classification [13.15873889847739]
We propose a series of experiments on a large dataset for stance detection in English.
We evaluate the contribution of different types of contextual information.
We show that structural information can be highly beneficial to text classification but only under certain circumstances.
arXiv Detail & Related papers (2024-02-05T12:56:22Z) - Pre-Training to Learn in Context [138.0745138788142]
The ability of in-context learning is not fully exploited because language models are not explicitly trained to learn in context.
We propose PICL (Pre-training for In-Context Learning), a framework to enhance the language models' in-context learning ability.
Our experiments show that PICL is more effective and task-generalizable than a range of baselines, outperforming larger language models with nearly 4x parameters.
arXiv Detail & Related papers (2023-05-16T03:38:06Z) - On Guiding Visual Attention with Language Specification [76.08326100891571]
We use high-level language specification as advice for constraining the classification evidence to task-relevant features, instead of distractors.
We show that supervising spatial attention in this way improves performance on classification tasks with biased and noisy data.
arXiv Detail & Related papers (2022-02-17T22:40:19Z) - SCROLLS: Standardized CompaRison Over Long Language Sequences [62.574959194373264]
We introduce SCROLLS, a suite of tasks that require reasoning over long texts.
SCROLLS contains summarization, question answering, and natural language inference tasks.
We make all datasets available in a unified text-to-text format and host a live leaderboard to facilitate research on model architecture and pretraining methods.
arXiv Detail & Related papers (2022-01-10T18:47:15Z) - Larger-Context Tagging: When and Why Does It Work? [55.407651696813396]
We focus on investigating when and why the larger-context training, as a general strategy, can work.
We set up a testbed based on four tagging tasks and thirteen datasets.
arXiv Detail & Related papers (2021-04-09T15:35:30Z) - EDS-MEMBED: Multi-sense embeddings based on enhanced distributional
semantic structures via a graph walk over word senses [0.0]
We leverage the rich semantic structures in WordNet to enhance the quality of multi-sense embeddings.
We derive new distributional semantic similarity measures for M-SE from prior ones.
We report evaluation results on 11 benchmark datasets involving WSD and Word Similarity tasks.
arXiv Detail & Related papers (2021-02-27T14:36:55Z) - A Comparative Study on Structural and Semantic Properties of Sentence
Embeddings [77.34726150561087]
We propose a set of experiments using a widely-used large-scale data set for relation extraction.
We show that different embedding spaces have different degrees of strength for the structural and semantic properties.
These results provide useful information for developing embedding-based relation extraction methods.
arXiv Detail & Related papers (2020-09-23T15:45:32Z) - DeCLUTR: Deep Contrastive Learning for Unsupervised Textual
Representations [4.36561468436181]
We present DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations.
Our approach closes the performance gap between unsupervised and supervised pretraining for universal sentence encoders.
Our code and pretrained models are publicly available and can be easily adapted to new domains or used to embed unseen text.
arXiv Detail & Related papers (2020-06-05T20:00:28Z) - Quantifying the Contextualization of Word Representations with Semantic
Class Probing [8.401007663676214]
Pretrained language models have achieved a new state of the art on many NLP tasks, but there are still many open questions about how and why they work so well.
We quantify the amount of contextualization, i.e., how well words are interpreted in context, by studying the extent to which semantic classes of a word can be inferred from its contextualized embeddings.
arXiv Detail & Related papers (2020-04-25T17:49:37Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.