RoPDA: Robust Prompt-based Data Augmentation for Low-Resource Named
Entity Recognition
- URL: http://arxiv.org/abs/2307.07417v2
- Date: Mon, 17 Jul 2023 06:08:22 GMT
- Title: RoPDA: Robust Prompt-based Data Augmentation for Low-Resource Named
Entity Recognition
- Authors: Sihan Song, Furao Shen, Jian Zhao
- Abstract summary: Robust Prompt-based Data Augmentation (RoPDA) for low-resource NER
Based on pre-trained language models (PLMs) with continuous prompt, RoPDA performs entity augmentation and context augmentation.
Experiments on three benchmarks from different domains demonstrate that RoPDA significantly improves upon strong baselines.
- Score: 10.03246698225533
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Data augmentation has been widely used in low-resource NER tasks to tackle
the problem of data sparsity. However, previous data augmentation methods have
the disadvantages of disrupted syntactic structures, token-label mismatch, and
requirement for external knowledge or manual effort. To address these issues,
we propose Robust Prompt-based Data Augmentation (RoPDA) for low-resource NER.
Based on pre-trained language models (PLMs) with continuous prompt, RoPDA
performs entity augmentation and context augmentation through five fundamental
augmentation operations to generate label-flipping and label-preserving
examples. To optimize the utilization of the augmented samples, we present two
techniques: Self-Consistency Filtering and mixup. The former effectively
eliminates low-quality samples, while the latter prevents performance
degradation arising from the direct utilization of label-flipping samples.
Extensive experiments on three benchmarks from different domains demonstrate
that RoPDA significantly improves upon strong baselines, and also outperforms
state-of-the-art semi-supervised learning methods when unlabeled data is
included.
Related papers
- Improving a Named Entity Recognizer Trained on Noisy Data with a Few
Clean Instances [55.37242480995541]
We propose to denoise noisy NER data with guidance from a small set of clean instances.
Along with the main NER model we train a discriminator model and use its outputs to recalibrate the sample weights.
Results on public crowdsourcing and distant supervision datasets show that the proposed method can consistently improve performance with a small guidance set.
arXiv Detail & Related papers (2023-10-25T17:23:37Z) - Towards Robust Aspect-based Sentiment Analysis through
Non-counterfactual Augmentations [40.71705332298682]
We present an alternative approach that relies on non-counterfactual data augmentation.
Our approach further establishes a new state-of-the-art on the ABSA robustness benchmark and transfers well across domains.
arXiv Detail & Related papers (2023-06-24T13:57:32Z) - Implicit Counterfactual Data Augmentation for Robust Learning [24.795542869249154]
This study proposes an Implicit Counterfactual Data Augmentation method to remove spurious correlations and make stable predictions.
Experiments have been conducted across various biased learning scenarios covering both image and text datasets.
arXiv Detail & Related papers (2023-04-26T10:36:40Z) - On-the-fly Denoising for Data Augmentation in Natural Language
Understanding [101.46848743193358]
We propose an on-the-fly denoising technique for data augmentation that learns from soft augmented labels provided by an organic teacher model trained on the cleaner original data.
Our method can be applied to general augmentation techniques and consistently improve the performance on both text classification and question-answering tasks.
arXiv Detail & Related papers (2022-12-20T18:58:33Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Reprint: a randomized extrapolation based on principal components for
data augmentation [11.449992652644577]
This paper presents a simple and effective hidden-space data augmentation method for imbalanced data classification.
Given hidden-space representations of samples in each class, REPRINT extrapolates, in a randomized fashion, augmented examples for target class.
This method involves a label refinement component which allows to synthesize new soft labels for augmented examples.
arXiv Detail & Related papers (2022-04-26T01:38:47Z) - Unsupervised Domain Adaptive Salient Object Detection Through
Uncertainty-Aware Pseudo-Label Learning [104.00026716576546]
We propose to learn saliency from synthetic but clean labels, which naturally has higher pixel-labeling quality without the effort of manual annotations.
We show that our proposed method outperforms the existing state-of-the-art deep unsupervised SOD methods on several benchmark datasets.
arXiv Detail & Related papers (2022-02-26T16:03:55Z) - DAGA: Data Augmentation with a Generation Approach for Low-resource
Tagging Tasks [88.62288327934499]
We propose a novel augmentation method with language models trained on the linearized labeled sentences.
Our method is applicable to both supervised and semi-supervised settings.
arXiv Detail & Related papers (2020-11-03T07:49:15Z) - Data Augmentation Imbalance For Imbalanced Attribute Classification [60.71438625139922]
We propose a new re-sampling algorithm called: data augmentation imbalance (DAI) to explicitly enhance the ability to discriminate the fewer attributes.
Our DAI algorithm achieves state-of-the-art results, based on pedestrian attribute datasets.
arXiv Detail & Related papers (2020-04-19T20:43:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.