Data Augmentation for Cross-Domain Named Entity Recognition
- URL: http://arxiv.org/abs/2109.01758v1
- Date: Sat, 4 Sep 2021 00:50:55 GMT
- Title: Data Augmentation for Cross-Domain Named Entity Recognition
- Authors: Shuguang Chen, Gustavo Aguilar, Leonardo Neves and Thamar Solorio
- Abstract summary: We study cross-domain data augmentation for the named entity recognition task.
We propose a novel neural architecture to transform the data representation from a high-resource to a low-resource domain.
We show that transforming the data to the low-resource domain representation achieves significant improvements over only using data from high-resource domains.
- Score: 22.66649873447105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current work in named entity recognition (NER) shows that data augmentation
techniques can produce more robust models. However, most existing techniques
focus on augmenting in-domain data in low-resource scenarios where annotated
data is quite limited. In contrast, we study cross-domain data augmentation for
the NER task. We investigate the possibility of leveraging data from
high-resource domains by projecting it into the low-resource domains.
Specifically, we propose a novel neural architecture to transform the data
representation from a high-resource to a low-resource domain by learning the
patterns (e.g. style, noise, abbreviations, etc.) in the text that
differentiate them and a shared feature space where both domains are aligned.
We experiment with diverse datasets and show that transforming the data to the
low-resource domain representation achieves significant improvements over only
using data from high-resource domains.
Related papers
- Domain Expansion and Boundary Growth for Open-Set Single-Source Domain Generalization [70.02187124865627]
Open-set single-source domain generalization aims to use a single-source domain to learn a robust model that can be generalized to unknown target domains.
We propose a novel learning approach based on domain expansion and boundary growth to expand the scarce source samples.
Our approach can achieve significant improvements and reach state-of-the-art performance on several cross-domain image classification datasets.
arXiv Detail & Related papers (2024-11-05T09:08:46Z) - Complex Style Image Transformations for Domain Generalization in Medical Images [6.635679521775917]
Domain generalization techniques aim to approach unknown domains from a single data source.
In this paper we introduce a novel framework, named CompStyle, which leverages style transfer and adversarial training.
We provide results from experiments on semantic segmentation on prostate data and corruption robustness on cardiac data.
arXiv Detail & Related papers (2024-06-01T04:57:31Z) - MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets [19.44142290594537]
Vision transformers (ViTs) have emerged as a promising solution to improve medical image segmentation (MIS)
ViTs are typically trained using a single source of data, which overlooks the valuable knowledge that could be leveraged from other available datasets.
In this paper, we propose MDViT, the first multi-domain ViT that includes domain adapters to mitigate data-hunger and combat NKT.
arXiv Detail & Related papers (2023-07-05T08:19:29Z) - Combining Data Generation and Active Learning for Low-Resource Question Answering [23.755283239897132]
We propose a novel approach that combines data augmentation via question-answer generation with Active Learning to improve performance in low-resource settings.
Our findings show that our novel approach, where humans are incorporated in a data generation approach, boosts performance in the low-resource, domain-specific setting.
arXiv Detail & Related papers (2022-11-27T16:31:33Z) - Inferring Latent Domains for Unsupervised Deep Domain Adaptation [54.963823285456925]
Unsupervised Domain Adaptation (UDA) refers to the problem of learning a model in a target domain where labeled data are not available.
This paper introduces a novel deep architecture which addresses the problem of UDA by automatically discovering latent domains in visual datasets.
We evaluate our approach on publicly available benchmarks, showing that it outperforms state-of-the-art domain adaptation methods.
arXiv Detail & Related papers (2021-03-25T14:33:33Z) - Addressing Zero-Resource Domains Using Document-Level Context in Neural
Machine Translation [80.40677540516616]
We show that when in-domain parallel data is not available, access to document-level context enables better capturing of domain generalities.
We present two document-level Transformer models which are capable of using large context sizes.
arXiv Detail & Related papers (2020-04-30T16:28:19Z) - Dynamic Fusion Network for Multi-Domain End-to-end Task-Oriented Dialog [70.79442700890843]
We propose a novel Dynamic Fusion Network (DF-Net) which automatically exploit the relevance between the target domain and each domain.
With little training data, we show its transferability by outperforming prior best model by 13.9% on average.
arXiv Detail & Related papers (2020-04-23T08:17:22Z) - Deep Domain-Adversarial Image Generation for Domain Generalisation [115.21519842245752]
Machine learning models typically suffer from the domain shift problem when trained on a source dataset and evaluated on a target dataset of different distribution.
To overcome this problem, domain generalisation (DG) methods aim to leverage data from multiple source domains so that a trained model can generalise to unseen domains.
We propose a novel DG approach based on emphDeep Domain-Adversarial Image Generation (DDAIG)
arXiv Detail & Related papers (2020-03-12T23:17:47Z) - Zero-Resource Cross-Domain Named Entity Recognition [68.83177074227598]
Existing models for cross-domain named entity recognition rely on numerous unlabeled corpus or labeled NER training data in target domains.
We propose a cross-domain NER model that does not use any external resources.
arXiv Detail & Related papers (2020-02-14T09:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.