An Empirical Study of Contextual Data Augmentation for Japanese Zero
Anaphora Resolution
- URL: http://arxiv.org/abs/2011.00948v2
- Date: Wed, 4 Nov 2020 16:56:07 GMT
- Title: An Empirical Study of Contextual Data Augmentation for Japanese Zero
Anaphora Resolution
- Authors: Ryuto Konno, Yuichiroh Matsubayashi, Shun Kiyono, Hiroki Ouchi, Ryo
Takahashi, Kentaro Inui
- Abstract summary: This study explores how effectively this problem can be alleviated by data augmentation.
We adopt a state-of-the-art data augmentation method that generates labeled training instances using a pretrained language model.
The proposed method can improve the quality of the augmented training data when compared to the conventional data augmentation.
- Score: 40.77086563127755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One critical issue of zero anaphora resolution (ZAR) is the scarcity of
labeled data. This study explores how effectively this problem can be
alleviated by data augmentation. We adopt a state-of-the-art data augmentation
method, called the contextual data augmentation (CDA), that generates labeled
training instances using a pretrained language model. The CDA has been reported
to work well for several other natural language processing tasks, including
text classification and machine translation. This study addresses two
underexplored issues on CDA, that is, how to reduce the computational cost of
data augmentation and how to ensure the quality of the generated data. We also
propose two methods to adapt CDA to ZAR: [MASK]-based augmentation and
linguistically-controlled masking. Consequently, the experimental results on
Japanese ZAR show that our methods contribute to both the accuracy gain and the
computation cost reduction. Our closer analysis reveals that the proposed
method can improve the quality of the augmented training data when compared to
the conventional CDA.
Related papers
- Distributional Data Augmentation Methods for Low Resource Language [0.9208007322096533]
Easy data augmentation (EDA) augments the training data by injecting and replacing synonyms and randomly permuting sentences.
One major obstacle with EDA is the need for versatile and complete synonym dictionaries, which cannot be easily found in low-resource languages.
We propose two extensions, easy distributional data augmentation (EDDA) and type specific similar word replacement (TSSR), which uses semantic word context information and part-of-speech tags for word replacement and augmentation.
arXiv Detail & Related papers (2023-09-09T19:01:59Z) - Implicit Counterfactual Data Augmentation for Robust Learning [24.795542869249154]
This study proposes an Implicit Counterfactual Data Augmentation method to remove spurious correlations and make stable predictions.
Experiments have been conducted across various biased learning scenarios covering both image and text datasets.
arXiv Detail & Related papers (2023-04-26T10:36:40Z) - AugGPT: Leveraging ChatGPT for Text Data Augmentation [59.76140039943385]
We propose a text data augmentation approach based on ChatGPT (named AugGPT)
AugGPT rephrases each sentence in the training samples into multiple conceptually similar but semantically different samples.
Experiment results on few-shot learning text classification tasks show the superior performance of the proposed AugGPT approach.
arXiv Detail & Related papers (2023-02-25T06:58:16Z) - On-the-fly Denoising for Data Augmentation in Natural Language
Understanding [101.46848743193358]
We propose an on-the-fly denoising technique for data augmentation that learns from soft augmented labels provided by an organic teacher model trained on the cleaner original data.
Our method can be applied to general augmentation techniques and consistently improve the performance on both text classification and question-answering tasks.
arXiv Detail & Related papers (2022-12-20T18:58:33Z) - CropCat: Data Augmentation for Smoothing the Feature Distribution of EEG
Signals [3.5665681694253903]
We propose a novel data augmentation method, CropCat.
CropCat consists of two versions, CropCat-spatial and CropCat-temporal.
We show that generated data by CropCat smooths the feature distribution of EEG signals when training the model.
arXiv Detail & Related papers (2022-12-13T07:40:23Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - Data Augmentation for Dementia Detection in Spoken Language [1.7324358447544175]
Recent deep-learning techniques can offer a faster diagnosis and have shown promising results.
They require large amounts of labelled data which is not easily available for the task of dementia detection.
One effective solution to sparse data problems is data augmentation, though the exact methods need to be selected carefully.
arXiv Detail & Related papers (2022-06-26T13:40:25Z) - Investigating Lexical Replacements for Arabic-English Code-Switched Data
Augmentation [32.885722714728765]
We investigate data augmentation techniques for code-switching (CS) NLP systems.
We perform lexical replacements using word-aligned parallel corpora.
We compare these approaches against dictionary-based replacements.
arXiv Detail & Related papers (2022-05-25T10:44:36Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for
Natural Language Understanding [67.61357003974153]
We propose a novel data augmentation framework dubbed CoDA.
CoDA synthesizes diverse and informative augmented examples by integrating multiple transformations organically.
A contrastive regularization objective is introduced to capture the global relationship among all the data samples.
arXiv Detail & Related papers (2020-10-16T23:57:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.