Unsupervised Question Answering via Answer Diversifying
- URL: http://arxiv.org/abs/2208.10813v1
- Date: Tue, 23 Aug 2022 08:57:00 GMT
- Title: Unsupervised Question Answering via Answer Diversifying
- Authors: Yuxiang Nie, Heyan Huang, Zewen Chi, Xian-Ling Mao
- Abstract summary: We propose a novel unsupervised method by diversifying answers, named DiverseQA.
The proposed method is composed of three modules: data construction, data augmentation and denoising filter.
Extensive experiments show that the proposed method outperforms previous unsupervised models on five benchmark datasets.
- Score: 44.319944418802095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised question answering is an attractive task due to its independence
on labeled data. Previous works usually make use of heuristic rules as well as
pre-trained models to construct data and train QA models. However, most of
these works regard named entity (NE) as the only answer type, which ignores the
high diversity of answers in the real world. To tackle this problem, we propose
a novel unsupervised method by diversifying answers, named DiverseQA.
Specifically, the proposed method is composed of three modules: data
construction, data augmentation and denoising filter. Firstly, the data
construction module extends the extracted named entity into a longer sentence
constituent as the new answer span to construct a QA dataset with diverse
answers. Secondly, the data augmentation module adopts an answer-type dependent
data augmentation process via adversarial training in the embedding level.
Thirdly, the denoising filter module is designed to alleviate the noise in the
constructed data. Extensive experiments show that the proposed method
outperforms previous unsupervised models on five benchmark datasets, including
SQuADv1.1, NewsQA, TriviaQA, BioASQ, and DuoRC. Besides, the proposed method
shows strong performance in the few-shot learning setting.
Related papers
- Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering [25.577314828249897]
We propose a novel dataset, MUSIC-AVQA-R, crafted in two steps: rephrasing questions within the test split of a public dataset (MUSIC-AVQA) and introducing distribution shifts to split questions.
Experimental results show that this architecture achieves state-of-the-art performance on MUSIC-AVQA-R, notably obtaining a significant improvement of 9.32%.
arXiv Detail & Related papers (2024-04-18T09:16:02Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Chain-of-Skills: A Configurable Model for Open-domain Question Answering [79.8644260578301]
The retrieval model is an indispensable component for real-world knowledge-intensive tasks.
Recent work focuses on customized methods, limiting the model transferability and scalability.
We propose a modular retriever where individual modules correspond to key skills that can be reused across datasets.
arXiv Detail & Related papers (2023-05-04T20:19:39Z) - LIQUID: A Framework for List Question Answering Dataset Generation [17.86721740779611]
We propose LIQUID, an automated framework for generating list QA datasets from unlabeled corpora.
We first convert a passage from Wikipedia or PubMed into a summary and extract named entities from the summarized text as candidate answers.
We then create questions using an off-the-shelf question generator with the extracted entities and original passage.
Using our synthetic data, we significantly improve the performance of the previous best list QA models by exact-match F1 scores of 5.0 on MultiSpanQA, 1.9 on Quoref, and 2.8 averaged across three BioASQ benchmarks.
arXiv Detail & Related papers (2023-02-03T12:42:45Z) - Learning to Rank Question Answer Pairs with Bilateral Contrastive Data
Augmentation [39.22166065525888]
We propose a novel and easy-to-apply data augmentation strategy, namely Bilateral Generation (BiG)
With the augmented dataset, we design a contrastive training objective for learning to rank question answer pairs.
Experimental results on three benchmark datasets, namely TREC-QA, WikiQA, and ANTIQUE, show that our method significantly improves the performance of ranking models.
arXiv Detail & Related papers (2021-06-21T13:29:43Z) - Text Modular Networks: Learning to Decompose Tasks in the Language of
Existing Models [61.480085460269514]
We propose a framework for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models.
We use this framework to build ModularQA, a system that can answer multi-hop reasoning questions by decomposing them into sub-questions answerable by a neural factoid single-span QA model and a symbolic calculator.
arXiv Detail & Related papers (2020-09-01T23:45:42Z) - ManyModalQA: Modality Disambiguation and QA over Diverse Inputs [73.93607719921945]
We present a new multimodal question answering challenge, ManyModalQA, in which an agent must answer a question by considering three distinct modalities.
We collect our data by scraping Wikipedia and then utilize crowdsourcing to collect question-answer pairs.
arXiv Detail & Related papers (2020-01-22T14:39:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.