Contextualized Embeddings in Named-Entity Recognition: An Empirical
Study on Generalization
- URL: http://arxiv.org/abs/2001.08053v1
- Date: Wed, 22 Jan 2020 15:15:34 GMT
- Title: Contextualized Embeddings in Named-Entity Recognition: An Empirical
Study on Generalization
- Authors: Bruno Taill\'e, Vincent Guigue, Patrick Gallinari
- Abstract summary: Contextualized embeddings use unsupervised language model pretraining to compute word representations depending on their context.
Standard English benchmarks overestimate the importance of lexical over contextual features because of an unrealistic lexical overlap between train and test mentions.
We show that they are particularly beneficial for unseen mentions detection, especially out-of-domain.
- Score: 14.47381093162237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contextualized embeddings use unsupervised language model pretraining to
compute word representations depending on their context. This is intuitively
useful for generalization, especially in Named-Entity Recognition where it is
crucial to detect mentions never seen during training. However, standard
English benchmarks overestimate the importance of lexical over contextual
features because of an unrealistic lexical overlap between train and test
mentions. In this paper, we perform an empirical analysis of the generalization
capabilities of state-of-the-art contextualized embeddings by separating
mentions by novelty and with out-of-domain evaluation. We show that they are
particularly beneficial for unseen mentions detection, especially
out-of-domain. For models trained on CoNLL03, language model contextualization
leads to a +1.2% maximal relative micro-F1 score increase in-domain against
+13% out-of-domain on the WNUT dataset
Related papers
- A Unified Label-Aware Contrastive Learning Framework for Few-Shot Named Entity Recognition [6.468625143772815]
We propose a unified label-aware token-level contrastive learning framework.
Our approach enriches the context by utilizing label semantics as suffix prompts.
It simultaneously optimize context-native and context-label contrastive learning objectives.
arXiv Detail & Related papers (2024-04-26T06:19:21Z) - How Abstract Is Linguistic Generalization in Large Language Models?
Experiments with Argument Structure [2.530495315660486]
We investigate the degree to which pre-trained Transformer-based large language models represent relationships between contexts.
We find that LLMs perform well in generalizing the distribution of a novel noun argument between related contexts.
However, LLMs fail at generalizations between related contexts that have not been observed during pre-training.
arXiv Detail & Related papers (2023-11-08T18:58:43Z) - On Context Utilization in Summarization with Large Language Models [83.84459732796302]
Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries.
Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens.
We conduct the first comprehensive study on context utilization and position bias in summarization.
arXiv Detail & Related papers (2023-10-16T16:45:12Z) - Entity Disambiguation with Entity Definitions [50.01142092276296]
Local models have recently attained astounding performances in Entity Disambiguation (ED)
Previous works limited their studies to using, as the textual representation of each candidate, only its Wikipedia title.
In this paper, we address this limitation and investigate to what extent more expressive textual representations can mitigate it.
We report a new state of the art on 2 out of 6 benchmarks we consider and strongly improve the generalization capability over unseen patterns.
arXiv Detail & Related papers (2022-10-11T17:46:28Z) - Domain-Specific Word Embeddings with Structure Prediction [3.057136788672694]
We present an empirical evaluation on New York Times articles and two English Wikipedia datasets with articles on science and philosophy.
Our method, called Word2Vec with Structure Prediction (W2VPred), provides better performance than baselines in terms of the general analogy tests.
As a use case in the field of Digital Humanities we demonstrate how to raise novel research questions for high literature from the German Text Archive.
arXiv Detail & Related papers (2022-10-06T12:45:48Z) - Contextualization and Generalization in Entity and Relation Extraction [0.0]
We study the behaviour of state-of-the-art models regarding generalization to facts unseen during training.
Traditional benchmarks present important lexical overlap between mentions and relations used for training and evaluating models.
We propose empirical studies to separate performance based on mention and relation overlap with the training set.
arXiv Detail & Related papers (2022-06-15T14:16:42Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - Text Classification with Few Examples using Controlled Generalization [58.971750512415134]
Current practice relies on pre-trained word embeddings to map words unseen in training to similar seen ones.
Our alternative begins with sparse pre-trained representations derived from unlabeled parsed corpora.
We show that a feed-forward network over these vectors is especially effective in low-data scenarios.
arXiv Detail & Related papers (2020-05-18T06:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.