Related papers: DARE: Data Augmented Relation Extraction with GPT-2

DARE: Data Augmented Relation Extraction with GPT-2

URL: http://arxiv.org/abs/2004.13845v1
Date: Mon, 6 Apr 2020 14:38:36 GMT
Title: DARE: Data Augmented Relation Extraction with GPT-2
Authors: Yannis Papanikolaou and Andrea Pierleoni
Abstract summary: We present Data Augmented Relation Extraction(DARE), a simple method to augment training data by properly fine-tuning GPT-2. DARE achieves new state of the art in three widely used biomedical RE datasets surpassing the previous best results by 4.7 F1 points on average.
Score: 0.26651200086513094
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-world Relation Extraction (RE) tasks are challenging to deal with, either due to limited training data or class imbalance issues. In this work, we present Data Augmented Relation Extraction(DARE), a simple method to augment training data by properly fine-tuning GPT-2 to generate examples for specific relation types. The generated training data is then used in combination with the gold dataset to train a BERT-based RE classifier. In a series of experiments we show the advantages of our method, which leads in improvements of up to 11 F1 score points against a strong base-line. Also, DARE achieves new state of the art in three widely used biomedical RE datasets surpassing the previous best results by 4.7 F1 points on average.

Related papers

Curvature Enhanced Data Augmentation for Regression [4.910937238451485]
We introduce the Curvature-Enhanced Manifold Sampling (CEMS) method for regression tasks.<n>CEMS delivers superior performance in both in-distribution and out-of-distribution scenarios.
arXiv Detail & Related papers (2025-06-07T16:18:37Z)
Generative Pre-Trained Transformer for Symbolic Regression Base In-Context Reinforcement Learning [12.660401635672967]
Finding mathematical formulas from observational data is a major demand of scientific research. FormulaGPT achieves the state-of-the-art performance in fitting ability compared with four baselines.
arXiv Detail & Related papers (2024-04-09T14:08:47Z)
On Evaluation Protocols for Data Augmentation in a Limited Data Scenario [11.09784120582206]
We show that classical data augmentation (which modify sentences) is simply a way of performing better fine-tuning. We further show that zero- and few-shot DA via conversational agents such as ChatGPT or LLama2 can increase performances.
arXiv Detail & Related papers (2024-02-22T16:42:37Z)
Retrosynthesis prediction enhanced by in-silico reaction data augmentation [66.5643280109899]
We present RetroWISE, a framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation. On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models.
arXiv Detail & Related papers (2024-01-31T07:40:37Z)
Data Augmentation for Traffic Classification [54.92823760790628]
Data Augmentation (DA) is a technique widely adopted in Computer Vision (CV) and Natural Language Processing (NLP) tasks. DA has struggled to gain traction in networking contexts, particularly in Traffic Classification (TC) tasks.
arXiv Detail & Related papers (2024-01-19T15:25:09Z)
Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries. Experimental results show that our method improves consistently over existing methods. Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z)
Efficient Grammatical Error Correction Via Multi-Task Training and Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences. We formulate each task as a sequence-to-sequence problem and perform multi-task training. We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z)
RoPDA: Robust Prompt-based Data Augmentation for Low-Resource Named Entity Recognition [10.03246698225533]
Robust Prompt-based Data Augmentation (RoPDA) for low-resource NER Based on pre-trained language models (PLMs) with continuous prompt, RoPDA performs entity augmentation and context augmentation. Experiments on three benchmarks from different domains demonstrate that RoPDA significantly improves upon strong baselines.
arXiv Detail & Related papers (2023-07-11T14:44:14Z)
Guiding Generative Language Models for Data Augmentation in Few-Shot Text Classification [59.698811329287174]
We leverage GPT-2 for generating artificial training instances in order to improve classification performance. Our results show that fine-tuning GPT-2 in a handful of label instances leads to consistent classification improvements.
arXiv Detail & Related papers (2021-11-17T12:10:03Z)
S^3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization [104.87483578308526]
We propose the model S3-Rec, which stands for Self-Supervised learning for Sequential Recommendation. For our task, we devise four auxiliary self-supervised objectives to learn the correlations among attribute, item, subsequence, and sequence. Extensive experiments conducted on six real-world datasets demonstrate the superiority of our proposed method over existing state-of-the-art methods.
arXiv Detail & Related papers (2020-08-18T11:44:10Z)
Data Weighted Training Strategies for Grammatical Error Correction [8.370770440898454]
We show how to incorporate delta-log-perplexity, a type of example scoring, into a training schedule for Grammatical Error Correction (GEC) Models trained on scored data achieve state-of-the-art results on common GEC test sets.
arXiv Detail & Related papers (2020-08-07T03:30:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.