BET: A Backtranslation Approach for Easy Data Augmentation in
Transformer-based Paraphrase Identification Context
- URL: http://arxiv.org/abs/2009.12452v1
- Date: Fri, 25 Sep 2020 22:06:06 GMT
- Title: BET: A Backtranslation Approach for Easy Data Augmentation in
Transformer-based Paraphrase Identification Context
- Authors: Jean-Philippe Corbeil and Hadi Abdi Ghadivel
- Abstract summary: We call this approach BET by which we analyze the backtranslation data augmentation on the transformer-based architectures.
Our findings suggest that BET improves the paraphrase identification performance on the Microsoft Research Paraphrase Corpus to more than 3% on both accuracy and F1 score.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Newly-introduced deep learning architectures, namely BERT, XLNet, RoBERTa and
ALBERT, have been proved to be robust on several NLP tasks. However, the
datasets trained on these architectures are fixed in terms of size and
generalizability. To relieve this issue, we apply one of the most inexpensive
solutions to update these datasets. We call this approach BET by which we
analyze the backtranslation data augmentation on the transformer-based
architectures. Using the Google Translate API with ten intermediary languages
from ten different language families, we externally evaluate the results in the
context of automatic paraphrase identification in a transformer-based
framework. Our findings suggest that BET improves the paraphrase identification
performance on the Microsoft Research Paraphrase Corpus (MRPC) to more than 3%
on both accuracy and F1 score. We also analyze the augmentation in the low-data
regime with downsampled versions of MRPC, Twitter Paraphrase Corpus (TPC) and
Quora Question Pairs. In many low-data cases, we observe a switch from a
failing model on the test set to reasonable performances. The results
demonstrate that BET is a highly promising data augmentation technique: to push
the current state-of-the-art of existing datasets and to bootstrap the
utilization of deep learning architectures in the low-data regime of a hundred
samples.
Related papers
- Multi-Sentence Grounding for Long-term Instructional Video [63.27905419718045]
We aim to establish an automatic, scalable pipeline for denoising a large-scale instructional dataset.
We construct a high-quality video-text dataset with multiple descriptive steps supervision, named HowToStep.
arXiv Detail & Related papers (2023-12-21T17:28:09Z) - Enriching Relation Extraction with OpenIE [70.52564277675056]
Relation extraction (RE) is a sub-discipline of information extraction (IE)
In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE.
Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models.
arXiv Detail & Related papers (2022-12-19T11:26:23Z) - Exploring the State-of-the-Art Language Modeling Methods and Data
Augmentation Techniques for Multilingual Clause-Level Morphology [3.8498574327875947]
We present our work on all three parts of the shared task: inflection, reinflection, and analysis.
We mainly explore two approaches: Transformer models in combination with data augmentation, and exploiting the state-of-the-art language modeling techniques for morphological analysis.
Our methods achieved first place in each of the three tasks and outperforms mT5-baseline with 89% for inflection, 80% for reinflection and 12% for analysis.
arXiv Detail & Related papers (2022-11-03T11:53:39Z) - Improving Retrieval Augmented Neural Machine Translation by Controlling
Source and Fuzzy-Match Interactions [15.845071122977158]
We build on the idea of Retrieval Augmented Translation (RAT) where top-k in-domain fuzzy matches are found for the source sentence.
We propose a novel architecture to control interactions between a source sentence and the top-k fuzzy target-language matches.
arXiv Detail & Related papers (2022-10-10T23:33:15Z) - Structural Biases for Improving Transformers on Translation into
Morphologically Rich Languages [120.74406230847904]
TP-Transformer augments the traditional Transformer architecture to include an additional component to represent structure.
The second method imbues structure at the data level by segmenting the data with morphological tokenization.
We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset.
arXiv Detail & Related papers (2022-08-11T22:42:24Z) - Parameter-Efficient Abstractive Question Answering over Tables or Text [60.86457030988444]
A long-term ambition of information seeking QA systems is to reason over multi-modal contexts and generate natural answers to user queries.
Memory intensive pre-trained language models are adapted to downstream tasks such as QA by fine-tuning the model on QA data in a specific modality like unstructured text or structured tables.
To avoid training such memory-hungry models while utilizing a uniform architecture for each modality, parameter-efficient adapters add and train small task-specific bottle-neck layers between transformer layers.
arXiv Detail & Related papers (2022-04-07T10:56:29Z) - Hierarchical Neural Network Approaches for Long Document Classification [3.6700088931938835]
We employ pre-trained Universal Sentence (USE) and Bidirectional Representations from Transformers (BERT) in a hierarchical setup to capture better representations efficiently.
Our proposed models are conceptually simple where we divide the input data into chunks and then pass this through base models of BERT and USE.
We show that USE + CNN/LSTM performs better than its stand-alone baseline. Whereas the BERT + CNN/LSTM performs on par with its stand-alone counterpart.
arXiv Detail & Related papers (2022-01-18T07:17:40Z) - Improving Neural Machine Translation by Bidirectional Training [85.64797317290349]
We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation.
Specifically, we bidirectionally update the model parameters at the early stage and then tune the model normally.
Experimental results show that BiT pushes the SOTA neural machine translation performance across 15 translation tasks on 8 language pairs significantly higher.
arXiv Detail & Related papers (2021-09-16T07:58:33Z) - The Surprising Performance of Simple Baselines for Misinformation
Detection [4.060731229044571]
We examine the performance of a broad set of modern transformer-based language models.
We present our framework as a baseline for creating and evaluating new methods for misinformation detection.
arXiv Detail & Related papers (2021-04-14T16:25:22Z) - Conversational Question Reformulation via Sequence-to-Sequence
Architectures and Pretrained Language Models [56.268862325167575]
This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs)
We leverage PLMs to address the strong token-to-token independence assumption made in the common objective, maximum likelihood estimation, for the CQR task.
We evaluate fine-tuned PLMs on the recently-introduced CANARD dataset as an in-domain task and validate the models using data from the TREC 2019 CAsT Track as an out-domain task.
arXiv Detail & Related papers (2020-04-04T11:07:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.