Contrastive Domain Adaptation for Question Answering using Limited Text
Corpora
- URL: http://arxiv.org/abs/2108.13854v1
- Date: Tue, 31 Aug 2021 14:05:55 GMT
- Title: Contrastive Domain Adaptation for Question Answering using Limited Text
Corpora
- Authors: Zhenrui Yue, Bernhard Kratzwald, Stefan Feuerriegel
- Abstract summary: We propose a novel framework for domain adaptation called contrastive domain adaptation for QA.
Specifically, CAQA combines techniques from question generation and domain-invariant learning to answer out-of-domain questions in settings with limited text corpora.
- Score: 20.116147632481983
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Question generation has recently shown impressive results in customizing
question answering (QA) systems to new domains. These approaches circumvent the
need for manually annotated training data from the new domain and, instead,
generate synthetic question-answer pairs that are used for training. However,
existing methods for question generation rely on large amounts of synthetically
generated datasets and costly computational resources, which render these
techniques widely inaccessible when the text corpora is of limited size. This
is problematic as many niche domains rely on small text corpora, which
naturally restricts the amount of synthetic data that can be generated. In this
paper, we propose a novel framework for domain adaptation called contrastive
domain adaptation for QA (CAQA). Specifically, CAQA combines techniques from
question generation and domain-invariant learning to answer out-of-domain
questions in settings with limited text corpora. Here, we train a QA system on
both source data and generated data from the target domain with a contrastive
adaptation loss that is incorporated in the training objective. By combining
techniques from question generation and domain-invariant learning, our model
achieved considerable improvements compared to state-of-the-art baselines.
Related papers
- Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts [83.57864140378035]
This paper proposes a method to cover longer contexts in Open-Domain Question-Answering tasks.
It leverages a small encoder language model that effectively encodes contexts, and the encoding applies cross-attention with origin inputs.
After fine-tuning, there is improved performance across two held-in datasets, four held-out datasets, and also in two In Context Learning settings.
arXiv Detail & Related papers (2024-04-02T15:10:11Z) - Combining Data Generation and Active Learning for Low-Resource Question Answering [23.755283239897132]
We propose a novel approach that combines data augmentation via question-answer generation with Active Learning to improve performance in low-resource settings.
Our findings show that our novel approach, where humans are incorporated in a data generation approach, boosts performance in the low-resource, domain-specific setting.
arXiv Detail & Related papers (2022-11-27T16:31:33Z) - QA Domain Adaptation using Hidden Space Augmentation and Self-Supervised
Contrastive Adaptation [24.39026345750824]
Question answering (QA) has recently shown impressive results for answering questions from customized domains.
Yet, a common challenge is to adapt QA models to an unseen target domain.
We propose a novel self-supervised framework called QADA for QA domain adaptation.
arXiv Detail & Related papers (2022-10-19T19:52:57Z) - Domain Adaptation for Question Answering via Question Classification [8.828396559882954]
We propose a novel framework: Question Classification for Question Answering (QC4QA)
For optimization, inter-domain discrepancy between the source and target domain is reduced via maximum mean discrepancy (MMD) distance.
We demonstrate the effectiveness of the proposed QC4QA with consistent improvements against the state-of-the-art baselines on multiple datasets.
arXiv Detail & Related papers (2022-09-12T03:12:02Z) - Unsupervised and self-adaptative techniques for cross-domain person
re-identification [82.54691433502335]
Person Re-Identification (ReID) across non-overlapping cameras is a challenging task.
Unsupervised Domain Adaptation (UDA) is a promising alternative, as it performs feature-learning adaptation from a model trained on a source to a target domain without identity-label annotation.
In this paper, we propose a novel UDA-based ReID method that takes advantage of triplets of samples created by a new offline strategy.
arXiv Detail & Related papers (2021-03-21T23:58:39Z) - AGenT Zero: Zero-shot Automatic Multiple-Choice Question Generation for
Skill Assessments [11.355397923795488]
Multiple-choice questions (MCQs) offer the most promising avenue for skill evaluation in the era of virtual education and job recruiting.
Recent advances in natural language processing have given rise to many complex question generation methods.
AGenT Zero successfully outperforms other pre-trained methods in fluency and semantic similarity.
arXiv Detail & Related papers (2020-11-25T04:06:57Z) - ClarQ: A large-scale and diverse dataset for Clarification Question
Generation [67.1162903046619]
We devise a novel bootstrapping framework that assists in the creation of a diverse, large-scale dataset of clarification questions based on postcomments extracted from stackexchange.
We quantitatively demonstrate the utility of the newly created dataset by applying it to the downstream task of question-answering.
We release this dataset in order to foster research into the field of clarification question generation with the larger goal of enhancing dialog and question answering systems.
arXiv Detail & Related papers (2020-06-10T17:56:50Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z) - Knowledge Graph Simple Question Answering for Unseen Domains [9.263766921991452]
We propose a data-centric domain adaptation framework that is applicable to new domains.
We use distant supervision to extract a set of keywords that express each relation of the unseen domain.
Our framework significantly improves over zero-shot baselines and is robust across domains.
arXiv Detail & Related papers (2020-05-25T11:34:54Z) - Robust Question Answering Through Sub-part Alignment [53.94003466761305]
We model question answering as an alignment problem.
We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
arXiv Detail & Related papers (2020-04-30T09:10:57Z) - ManyModalQA: Modality Disambiguation and QA over Diverse Inputs [73.93607719921945]
We present a new multimodal question answering challenge, ManyModalQA, in which an agent must answer a question by considering three distinct modalities.
We collect our data by scraping Wikipedia and then utilize crowdsourcing to collect question-answer pairs.
arXiv Detail & Related papers (2020-01-22T14:39:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.