Distantly-Supervised Joint Extraction with Noise-Robust Learning
- URL: http://arxiv.org/abs/2310.04994v2
- Date: Sat, 25 May 2024 18:20:12 GMT
- Title: Distantly-Supervised Joint Extraction with Noise-Robust Learning
- Authors: Yufei Li, Xiao Yu, Yanghong Guo, Yanchi Liu, Haifeng Chen, Cong Liu,
- Abstract summary: We focus on the problem of joint extraction in distantly-labeled data, whose labels are generated by aligning entity mentions with the corresponding entity and relation tags using a knowledge base (KB)
Existing approaches, either considering only one source of noise or making decisions using external knowledge, cannot well-utilize significant information in the training data.
We propose DENRL, a generalizable framework that incorporates a lightweight transformer backbone into a sequence labeling scheme for joint tagging.
- Score: 36.23022433465051
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Joint entity and relation extraction is a process that identifies entity pairs and their relations using a single model. We focus on the problem of joint extraction in distantly-labeled data, whose labels are generated by aligning entity mentions with the corresponding entity and relation tags using a knowledge base (KB). One key challenge is the presence of noisy labels arising from both incorrect entity and relation annotations, which significantly impairs the quality of supervised learning. Existing approaches, either considering only one source of noise or making decisions using external knowledge, cannot well-utilize significant information in the training data. We propose DENRL, a generalizable framework that 1) incorporates a lightweight transformer backbone into a sequence labeling scheme for joint tagging, and 2) employs a noise-robust framework that regularizes the tagging model with significant relation patterns and entity-relation dependencies, then iteratively self-adapts to instances with less noise from both sources. Surprisingly, experiments on two benchmark datasets show that DENRL, using merely its own parametric distribution and simple data-driven heuristics, outperforms large language model-based baselines by a large margin with better interpretability.
Related papers
- Improving a Named Entity Recognizer Trained on Noisy Data with a Few
Clean Instances [55.37242480995541]
We propose to denoise noisy NER data with guidance from a small set of clean instances.
Along with the main NER model we train a discriminator model and use its outputs to recalibrate the sample weights.
Results on public crowdsourcing and distant supervision datasets show that the proposed method can consistently improve performance with a small guidance set.
arXiv Detail & Related papers (2023-10-25T17:23:37Z) - Jointprop: Joint Semi-supervised Learning for Entity and Relation
Extraction with Heterogeneous Graph-based Propagation [13.418617500641401]
We propose Jointprop, a Heterogeneous Graph-based Propagation framework for joint semi-supervised entity and relation extraction.
We construct a unified span-based heterogeneous graph from entity and relation candidates and propagate class labels based on confidence scores.
We show that our framework outperforms the state-of-the-art semi-supervised approaches on NER and RE tasks.
arXiv Detail & Related papers (2023-05-25T09:07:04Z) - OneRel:Joint Entity and Relation Extraction with One Module in One Step [42.576188878294886]
Joint entity and relation extraction is an essential task in natural language processing and knowledge graph construction.
We propose a novel joint entity and relation extraction model, named OneRel, which casts joint extraction as a fine-grained triple classification problem.
arXiv Detail & Related papers (2022-03-10T15:09:59Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - Distantly-Supervised Named Entity Recognition with Noise-Robust Learning
and Language Model Augmented Self-Training [66.80558875393565]
We study the problem of training named entity recognition (NER) models using only distantly-labeled data.
We propose a noise-robust learning scheme comprised of a new loss function and a noisy label removal step.
Our method achieves superior performance, outperforming existing distantly-supervised NER models by significant margins.
arXiv Detail & Related papers (2021-09-10T17:19:56Z) - Element Intervention for Open Relation Extraction [27.408443348900057]
OpenRE aims to cluster relation instances referring to the same underlying relation.
Current OpenRE models are commonly trained on the datasets generated from distant supervision.
In this paper, we revisit the procedure of OpenRE from a causal view.
arXiv Detail & Related papers (2021-06-17T14:37:13Z) - Cross-Supervised Joint-Event-Extraction with Heterogeneous Information
Networks [61.950353376870154]
Joint-event-extraction is a sequence-to-sequence labeling task with a tag set composed of tags of triggers and entities.
We propose a Cross-Supervised Mechanism (CSM) to alternately supervise the extraction of triggers or entities.
Our approach outperforms the state-of-the-art methods in both entity and trigger extraction.
arXiv Detail & Related papers (2020-10-13T11:51:17Z) - Clustering-based Unsupervised Generative Relation Extraction [3.342376225738321]
We propose a Clustering-based Unsupervised generative Relation Extraction framework (CURE)
We use an "Encoder-Decoder" architecture to perform self-supervised learning so the encoder can extract relation information.
Our model performs better than state-of-the-art models on both New York Times (NYT) and United Nations Parallel Corpus (UNPC) standard datasets.
arXiv Detail & Related papers (2020-09-26T20:36:40Z) - Relabel the Noise: Joint Extraction of Entities and Relations via
Cooperative Multiagents [52.55119217982361]
We propose a joint extraction approach to handle noisy instances with a group of cooperative multiagents.
To handle noisy instances in a fine-grained manner, each agent in the cooperative group evaluates the instance by calculating a continuous confidence score from its own perspective.
A confidence consensus module is designed to gather the wisdom of all agents and re-distribute the noisy training set with confidence-scored labels.
arXiv Detail & Related papers (2020-04-21T12:03:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.