Related papers: Contextualization and Generalization in Entity and Relation Extraction

Contextualization and Generalization in Entity and Relation Extraction

URL: http://arxiv.org/abs/2206.07558v1
Date: Wed, 15 Jun 2022 14:16:42 GMT
Title: Contextualization and Generalization in Entity and Relation Extraction
Authors: Bruno Taill\'e
Abstract summary: We study the behaviour of state-of-the-art models regarding generalization to facts unseen during training. Traditional benchmarks present important lexical overlap between mentions and relations used for training and evaluating models. We propose empirical studies to separate performance based on mention and relation overlap with the training set.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: During the past decade, neural networks have become prominent in Natural Language Processing (NLP), notably for their capacity to learn relevant word representations from large unlabeled corpora. These word embeddings can then be transferred and finetuned for diverse end applications during a supervised training phase. More recently, in 2018, the transfer of entire pretrained Language Models and the preservation of their contextualization capacities enabled to reach unprecedented performance on virtually every NLP benchmark, sometimes even outperforming human baselines. However, as models reach such impressive scores, their comprehension abilities still appear as shallow, which reveal limitations of benchmarks to provide useful insights on their factors of performance and to accurately measure understanding capabilities. In this thesis, we study the behaviour of state-of-the-art models regarding generalization to facts unseen during training in two important Information Extraction tasks: Named Entity Recognition (NER) and Relation Extraction (RE). Indeed, traditional benchmarks present important lexical overlap between mentions and relations used for training and evaluating models, whereas the main interest of Information Extraction is to extract previously unknown information. We propose empirical studies to separate performance based on mention and relation overlap with the training set and find that pretrained Language Models are mainly beneficial to detect unseen mentions, in particular out-of-domain. While this makes them suited for real use cases, there is still a gap in performance between seen and unseen mentions that hurts generalization to new facts. In particular, even state-of-the-art ERE models rely on a shallow retention heuristic, basing their prediction more on arguments surface forms than context.

Related papers

Explaining the Unexplained: Revealing Hidden Correlations for Better Interpretability [1.8274323268621635]
Real Explainer (RealExp) is an interpretability method that decouples the Shapley Value into individual feature importance and feature correlation importance. RealExp enhances interpretability by precisely quantifying both individual feature contributions and their interactions.
arXiv Detail & Related papers (2024-12-02T10:50:50Z)
Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors [74.04775677110179]
In-context Learning (ICL) has become the primary method for performing natural language tasks with Large Language Models (LLMs) In this work, we examine whether this is the result of the aggregation used in corresponding datasets, where trying to combine low-agreement, disparate annotations might lead to annotation artifacts that create detrimental noise in the prompt. Our results indicate that aggregation is a confounding factor in the modeling of subjective tasks, and advocate focusing on modeling individuals instead.
arXiv Detail & Related papers (2024-10-17T17:16:00Z)
Bayes' Power for Explaining In-Context Learning Generalizations [46.17844703369127]
In this paper, we argue that a more useful interpretation of neural network behavior in this era is as an approximation of the true posterior. We show how models become robust in-context learners by effectively composing knowledge from their training data.
arXiv Detail & Related papers (2024-10-02T14:01:34Z)
Pre-training and Diagnosing Knowledge Base Completion Models [58.07183284468881]
We introduce and analyze an approach to knowledge transfer from one collection of facts to another without the need for entity or relation matching. The main contribution is a method that can make use of large-scale pre-training on facts, which were collected from unstructured text. To understand the obtained pre-trained models better, we then introduce a novel dataset for the analysis of pre-trained models for Open Knowledge Base Completion.
arXiv Detail & Related papers (2024-01-27T15:20:43Z)
Commonsense Knowledge Transfer for Pre-trained Language Models [83.01121484432801]
We introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model. It first exploits general texts to form queries for extracting commonsense knowledge from the neural commonsense knowledge model. It then refines the language model with two self-supervised objectives: commonsense mask infilling and commonsense relation prediction.
arXiv Detail & Related papers (2023-06-04T15:44:51Z)
Topics as Entity Clusters: Entity-based Topics from Large Language Models and Graph Neural Networks [0.6486052012623045]
We propose a novel topic clustering approach using bimodal vector representations of entities. Our approach is better suited to working with entities in comparison to state-of-the-art models.
arXiv Detail & Related papers (2023-01-06T10:54:54Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
Assessing the Limits of the Distributional Hypothesis in Semantic Spaces: Trait-based Relational Knowledge and the Impact of Co-occurrences [6.994580267603235]
This work contributes to the relatively untrodden path of what is required in data for models to capture meaningful representations of natural language. This entails evaluating how well English and Spanish semantic spaces capture a particular type of relational knowledge.
arXiv Detail & Related papers (2022-05-16T12:09:40Z)
Did the Cat Drink the Coffee? Challenging Transformers with Generalized Event Knowledge [59.22170796793179]
Transformers Language Models (TLMs) were tested on a benchmark for the textitdynamic estimation of thematic fit Our results show that TLMs can reach performances that are comparable to those achieved by SDM. However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge.
arXiv Detail & Related papers (2021-07-22T20:52:26Z)
Representation Learning for Weakly Supervised Relation Extraction [19.689433249830465]
In this thesis, we present several novel unsupervised pre-training models to learn the distributed text representation features. The experiments have demonstrated that this type of feature, combine with the traditional hand-crafted features, could improve the performance of the logistic classification model for relation extraction.
arXiv Detail & Related papers (2021-04-10T12:22:25Z)
Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models. We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.