Contextualization and Generalization in Entity and Relation Extraction
- URL: http://arxiv.org/abs/2206.07558v1
- Date: Wed, 15 Jun 2022 14:16:42 GMT
- Title: Contextualization and Generalization in Entity and Relation Extraction
- Authors: Bruno Taill\'e
- Abstract summary: We study the behaviour of state-of-the-art models regarding generalization to facts unseen during training.
Traditional benchmarks present important lexical overlap between mentions and relations used for training and evaluating models.
We propose empirical studies to separate performance based on mention and relation overlap with the training set.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: During the past decade, neural networks have become prominent in Natural
Language Processing (NLP), notably for their capacity to learn relevant word
representations from large unlabeled corpora. These word embeddings can then be
transferred and finetuned for diverse end applications during a supervised
training phase. More recently, in 2018, the transfer of entire pretrained
Language Models and the preservation of their contextualization capacities
enabled to reach unprecedented performance on virtually every NLP benchmark,
sometimes even outperforming human baselines. However, as models reach such
impressive scores, their comprehension abilities still appear as shallow, which
reveal limitations of benchmarks to provide useful insights on their factors of
performance and to accurately measure understanding capabilities.
In this thesis, we study the behaviour of state-of-the-art models regarding
generalization to facts unseen during training in two important Information
Extraction tasks: Named Entity Recognition (NER) and Relation Extraction (RE).
Indeed, traditional benchmarks present important lexical overlap between
mentions and relations used for training and evaluating models, whereas the
main interest of Information Extraction is to extract previously unknown
information. We propose empirical studies to separate performance based on
mention and relation overlap with the training set and find that pretrained
Language Models are mainly beneficial to detect unseen mentions, in particular
out-of-domain. While this makes them suited for real use cases, there is still
a gap in performance between seen and unseen mentions that hurts generalization
to new facts. In particular, even state-of-the-art ERE models rely on a shallow
retention heuristic, basing their prediction more on arguments surface forms
than context.
Related papers
- Machine Learning vs Deep Learning: The Generalization Problem [0.0]
This study investigates the comparative abilities of traditional machine learning (ML) models and deep learning (DL) algorithms in terms of extrapolation.
We present an empirical analysis where both ML and DL models are trained on an exponentially growing function and then tested on values outside the training domain.
Our findings suggest that deep learning models possess inherent capabilities to generalize beyond the training scope.
arXiv Detail & Related papers (2024-03-03T21:42:55Z) - Pre-training and Diagnosing Knowledge Base Completion Models [58.07183284468881]
We introduce and analyze an approach to knowledge transfer from one collection of facts to another without the need for entity or relation matching.
The main contribution is a method that can make use of large-scale pre-training on facts, which were collected from unstructured text.
To understand the obtained pre-trained models better, we then introduce a novel dataset for the analysis of pre-trained models for Open Knowledge Base Completion.
arXiv Detail & Related papers (2024-01-27T15:20:43Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Assessing the Limits of the Distributional Hypothesis in Semantic
Spaces: Trait-based Relational Knowledge and the Impact of Co-occurrences [6.994580267603235]
This work contributes to the relatively untrodden path of what is required in data for models to capture meaningful representations of natural language.
This entails evaluating how well English and Spanish semantic spaces capture a particular type of relational knowledge.
arXiv Detail & Related papers (2022-05-16T12:09:40Z) - Tracing Origins: Coref-aware Machine Reading Comprehension [43.352833140317486]
We imitated the human's reading process in connecting the anaphoric expressions and leverage the coreference information to enhance the word embeddings from the pre-trained model.
We demonstrated that the explicit incorporation of the coreference information in fine-tuning stage performed better than the incorporation of the coreference information in training a pre-trained language models.
arXiv Detail & Related papers (2021-10-15T09:28:35Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z) - Representation Learning for Weakly Supervised Relation Extraction [19.689433249830465]
In this thesis, we present several novel unsupervised pre-training models to learn the distributed text representation features.
The experiments have demonstrated that this type of feature, combine with the traditional hand-crafted features, could improve the performance of the logistic classification model for relation extraction.
arXiv Detail & Related papers (2021-04-10T12:22:25Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Learning from Context or Names? An Empirical Study on Neural Relation
Extraction [112.06614505580501]
We study the effect of two main information sources in text: textual context and entity mentions (names)
We propose an entity-masked contrastive pre-training framework for relation extraction (RE)
Our framework can improve the effectiveness and robustness of neural models in different RE scenarios.
arXiv Detail & Related papers (2020-10-05T11:21:59Z) - Contextualized Embeddings in Named-Entity Recognition: An Empirical
Study on Generalization [14.47381093162237]
Contextualized embeddings use unsupervised language model pretraining to compute word representations depending on their context.
Standard English benchmarks overestimate the importance of lexical over contextual features because of an unrealistic lexical overlap between train and test mentions.
We show that they are particularly beneficial for unseen mentions detection, especially out-of-domain.
arXiv Detail & Related papers (2020-01-22T15:15:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.