QAGAN: Adversarial Approach To Learning Domain Invariant Language
Features
- URL: http://arxiv.org/abs/2206.12388v1
- Date: Fri, 24 Jun 2022 17:42:18 GMT
- Title: QAGAN: Adversarial Approach To Learning Domain Invariant Language
Features
- Authors: Shubham Shrivastava and Kaiyue Wang
- Abstract summary: We explore adversarial training approach towards learning domain-invariant features.
We are able to achieve $15.2%$ improvement in EM score and $5.6%$ boost in F1 score on out-of-domain validation dataset.
- Score: 0.76146285961466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training models that are robust to data domain shift has gained an increasing
interest both in academia and industry. Question-Answering language models,
being one of the typical problem in Natural Language Processing (NLP) research,
has received much success with the advent of large transformer models. However,
existing approaches mostly work under the assumption that data is drawn from
same distribution during training and testing which is unrealistic and
non-scalable in the wild.
In this paper, we explore adversarial training approach towards learning
domain-invariant features so that language models can generalize well to
out-of-domain datasets. We also inspect various other ways to boost our model
performance including data augmentation by paraphrasing sentences, conditioning
end of answer span prediction on the start word, and carefully designed
annealing function. Our initial results show that in combination with these
methods, we are able to achieve $15.2\%$ improvement in EM score and $5.6\%$
boost in F1 score on out-of-domain validation dataset over the baseline. We
also dissect our model outputs and visualize the model hidden-states by
projecting them onto a lower-dimensional space, and discover that our specific
adversarial training approach indeed encourages the model to learn domain
invariant embedding and bring them closer in the multi-dimensional space.
Related papers
- Improving Domain Generalization with Domain Relations [77.63345406973097]
This paper focuses on domain shifts, which occur when the model is applied to new domains that are different from the ones it was trained on.
We propose a new approach called D$3$G to learn domain-specific models.
Our results show that D$3$G consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-02-06T08:11:16Z) - Learning to Augment via Implicit Differentiation for Domain
Generalization [107.9666735637355]
Domain generalization (DG) aims to overcome the problem by leveraging multiple source domains to learn a domain-generalizable model.
In this paper, we propose a novel augmentation-based DG approach, dubbed AugLearn.
AugLearn shows effectiveness on three standard DG benchmarks, PACS, Office-Home and Digits-DG.
arXiv Detail & Related papers (2022-10-25T18:51:51Z) - Understanding Domain Learning in Language Models Through Subpopulation
Analysis [35.16003054930906]
We investigate how different domains are encoded in modern neural network architectures.
We analyze the relationship between natural language domains, model size, and the amount of training data used.
arXiv Detail & Related papers (2022-10-22T21:12:57Z) - CLIN-X: pre-trained language models and a study on cross-task transfer
for concept extraction in the clinical domain [22.846469609263416]
We introduce the pre-trained CLIN-X (Clinical XLM-R) language models and show how CLIN-X outperforms other pre-trained transformer models.
Our studies reveal stable model performance despite a lack of annotated data with improvements of up to 47 F1 points when only 250 labeled sentences are available.
Our results highlight the importance of specialized language models as CLIN-X for concept extraction in non-standard domains.
arXiv Detail & Related papers (2021-12-16T10:07:39Z) - Efficient Domain Adaptation of Language Models via Adaptive Tokenization [5.058301279065432]
We show that domain-specific subword sequences can be efficiently determined directly from divergences in the conditional token distributions of the base and domain-specific corpora.
Our approach produces smaller models and less training and inference time than other approaches using tokenizer augmentation.
arXiv Detail & Related papers (2021-09-15T17:51:27Z) - Source-Free Open Compound Domain Adaptation in Semantic Segmentation [99.82890571842603]
In SF-OCDA, only the source pre-trained model and the target data are available to learn the target model.
We propose the Cross-Patch Style Swap (CPSS) to diversify samples with various patch styles in the feature-level.
Our method produces state-of-the-art results on the C-Driving dataset.
arXiv Detail & Related papers (2021-06-07T08:38:41Z) - Reprogramming Language Models for Molecular Representation Learning [65.00999660425731]
We propose Representation Reprogramming via Dictionary Learning (R2DL) for adversarially reprogramming pretrained language models for molecular learning tasks.
The adversarial program learns a linear transformation between a dense source model input space (language data) and a sparse target model input space (e.g., chemical and biological molecule data) using a k-SVD solver.
R2DL achieves the baseline established by state of the art toxicity prediction models trained on domain-specific data and outperforms the baseline in a limited training-data setting.
arXiv Detail & Related papers (2020-12-07T05:50:27Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.