FREDA: Flexible Relation Extraction Data Annotation
- URL: http://arxiv.org/abs/2204.07150v1
- Date: Thu, 14 Apr 2022 17:57:53 GMT
- Title: FREDA: Flexible Relation Extraction Data Annotation
- Authors: Michael Strobl, Amine Trabelsi, Osmar Zaiane
- Abstract summary: We propose an approach to produce high-quality datasets for the task of Relation Extraction quickly.
In our study, we were able to annotate 10,022 sentences for 19 relations in a reasonable amount of time.
- Score: 1.3750624267664153
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To effectively train accurate Relation Extraction models, sufficient and
properly labeled data is required. Adequately labeled data is difficult to
obtain and annotating such data is a tricky undertaking. Previous works have
shown that either accuracy has to be sacrificed or the task is extremely
time-consuming, if done accurately. We are proposing an approach in order to
produce high-quality datasets for the task of Relation Extraction quickly.
Neural models, trained to do Relation Extraction on the created datasets,
achieve very good results and generalize well to other datasets. In our study,
we were able to annotate 10,022 sentences for 19 relations in a reasonable
amount of time, and trained a commonly used baseline model for each relation.
Related papers
- Generative Expansion of Small Datasets: An Expansive Graph Approach [13.053285552524052]
We introduce an Expansive Synthesis model generating large-scale, information-rich datasets from minimal samples.
An autoencoder with self-attention layers and optimal transport refines distributional consistency.
Results show comparable performance, demonstrating the model's potential to augment training data effectively.
arXiv Detail & Related papers (2024-06-25T02:59:02Z) - Certain and Approximately Certain Models for Statistical Learning [4.318959672085627]
We show that it is possible to learn accurate models directly from data with missing values for certain training data and target models.
We build efficient algorithms with theoretical guarantees to check this necessity and return accurate models in cases where imputation is unnecessary.
arXiv Detail & Related papers (2024-02-27T22:49:33Z) - Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution [62.71425232332837]
We show that training amortized models with noisy labels is inexpensive and surprisingly effective.
This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z) - Improving Sentence-Level Relation Extraction through Curriculum Learning [7.117139527865022]
We propose a curriculum learning-based relation extraction model that split data by difficulty and utilize it for learning.
In the experiments with the representative sentence-level relation extraction datasets, TACRED and Re-TACRED, the proposed method showed good performances.
arXiv Detail & Related papers (2021-07-20T08:44:40Z) - Representation Learning for Weakly Supervised Relation Extraction [19.689433249830465]
In this thesis, we present several novel unsupervised pre-training models to learn the distributed text representation features.
The experiments have demonstrated that this type of feature, combine with the traditional hand-crafted features, could improve the performance of the logistic classification model for relation extraction.
arXiv Detail & Related papers (2021-04-10T12:22:25Z) - Time-Series Imputation with Wasserstein Interpolation for Optimal
Look-Ahead-Bias and Variance Tradeoff [66.59869239999459]
In finance, imputation of missing returns may be applied prior to training a portfolio optimization model.
There is an inherent trade-off between the look-ahead-bias of using the full data set for imputation and the larger variance in the imputation from using only the training data.
We propose a Bayesian posterior consensus distribution which optimally controls the variance and look-ahead-bias trade-off in the imputation.
arXiv Detail & Related papers (2021-02-25T09:05:35Z) - WebRED: Effective Pretraining And Finetuning For Relation Extraction On
The Web [4.702325864333419]
WebRED is a strongly-supervised human annotated dataset for extracting relationships from text found on the World Wide Web.
We show that combining pre-training on a large weakly supervised dataset with fine-tuning on a small strongly-supervised dataset leads to better relation extraction performance.
arXiv Detail & Related papers (2021-02-18T23:56:12Z) - Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles [66.15398165275926]
We propose a method that can automatically detect and ignore dataset-specific patterns, which we call dataset biases.
Our method trains a lower capacity model in an ensemble with a higher capacity model.
We show improvement in all settings, including a 10 point gain on the visual question answering dataset.
arXiv Detail & Related papers (2020-11-07T22:20:03Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z) - Dataset Cartography: Mapping and Diagnosing Datasets with Training
Dynamics [118.75207687144817]
We introduce Data Maps, a model-based tool to characterize and diagnose datasets.
We leverage a largely ignored source of information: the behavior of the model on individual instances during training.
Our results indicate that a shift in focus from quantity to quality of data could lead to robust models and improved out-of-distribution generalization.
arXiv Detail & Related papers (2020-09-22T20:19:41Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.