Contextualization and Generalization in Entity and Relation Extraction
- URL: http://arxiv.org/abs/2206.07558v1
- Date: Wed, 15 Jun 2022 14:16:42 GMT
- Title: Contextualization and Generalization in Entity and Relation Extraction
- Authors: Bruno Taill\'e
- Abstract summary: We study the behaviour of state-of-the-art models regarding generalization to facts unseen during training.
Traditional benchmarks present important lexical overlap between mentions and relations used for training and evaluating models.
We propose empirical studies to separate performance based on mention and relation overlap with the training set.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: During the past decade, neural networks have become prominent in Natural
Language Processing (NLP), notably for their capacity to learn relevant word
representations from large unlabeled corpora. These word embeddings can then be
transferred and finetuned for diverse end applications during a supervised
training phase. More recently, in 2018, the transfer of entire pretrained
Language Models and the preservation of their contextualization capacities
enabled to reach unprecedented performance on virtually every NLP benchmark,
sometimes even outperforming human baselines. However, as models reach such
impressive scores, their comprehension abilities still appear as shallow, which
reveal limitations of benchmarks to provide useful insights on their factors of
performance and to accurately measure understanding capabilities.
In this thesis, we study the behaviour of state-of-the-art models regarding
generalization to facts unseen during training in two important Information
Extraction tasks: Named Entity Recognition (NER) and Relation Extraction (RE).
Indeed, traditional benchmarks present important lexical overlap between
mentions and relations used for training and evaluating models, whereas the
main interest of Information Extraction is to extract previously unknown
information. We propose empirical studies to separate performance based on
mention and relation overlap with the training set and find that pretrained
Language Models are mainly beneficial to detect unseen mentions, in particular
out-of-domain. While this makes them suited for real use cases, there is still
a gap in performance between seen and unseen mentions that hurts generalization
to new facts. In particular, even state-of-the-art ERE models rely on a shallow
retention heuristic, basing their prediction more on arguments surface forms
than context.
Related papers
- Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors [74.04775677110179]
In-context Learning (ICL) has become the primary method for performing natural language tasks with Large Language Models (LLMs)
In this work, we examine whether this is the result of the aggregation used in corresponding datasets, where trying to combine low-agreement, disparate annotations might lead to annotation artifacts that create detrimental noise in the prompt.
Our results indicate that aggregation is a confounding factor in the modeling of subjective tasks, and advocate focusing on modeling individuals instead.
arXiv Detail & Related papers (2024-10-17T17:16:00Z) - Bayes' Power for Explaining In-Context Learning Generalizations [46.17844703369127]
In this paper, we argue that a more useful interpretation of neural network behavior in this era is as an approximation of the true posterior.
We show how models become robust in-context learners by effectively composing knowledge from their training data.
arXiv Detail & Related papers (2024-10-02T14:01:34Z) - Pre-training and Diagnosing Knowledge Base Completion Models [58.07183284468881]
We introduce and analyze an approach to knowledge transfer from one collection of facts to another without the need for entity or relation matching.
The main contribution is a method that can make use of large-scale pre-training on facts, which were collected from unstructured text.
To understand the obtained pre-trained models better, we then introduce a novel dataset for the analysis of pre-trained models for Open Knowledge Base Completion.
arXiv Detail & Related papers (2024-01-27T15:20:43Z) - Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model.
It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z) - Topics as Entity Clusters: Entity-based Topics from Large Language Models and Graph Neural Networks [0.6486052012623045]
We propose a novel topic clustering approach using bimodal vector representations of entities.
Our approach is better suited to working with entities in comparison to state-of-the-art models.
arXiv Detail & Related papers (2023-01-06T10:54:54Z) - Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP)
What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining.
How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z) - Assessing the Limits of the Distributional Hypothesis in Semantic
Spaces: Trait-based Relational Knowledge and the Impact of Co-occurrences [6.994580267603235]
This work contributes to the relatively untrodden path of what is required in data for models to capture meaningful representations of natural language.
This entails evaluating how well English and Spanish semantic spaces capture a particular type of relational knowledge.
arXiv Detail & Related papers (2022-05-16T12:09:40Z) - Did the Cat Drink the Coffee? Challenging Transformers with Generalized
Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit
Our results show that TLMs can reach performances that are comparable to those achieved by SDM.
However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z) - Representation Learning for Weakly Supervised Relation Extraction [19.689433249830465]
In this thesis, we present several novel unsupervised pre-training models to learn the distributed text representation features.
The experiments have demonstrated that this type of feature, combine with the traditional hand-crafted features, could improve the performance of the logistic classification model for relation extraction.
arXiv Detail & Related papers (2021-04-10T12:22:25Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.