Investigating Information Inconsistency in Multilingual Open-Domain
Question Answering
- URL: http://arxiv.org/abs/2205.12456v1
- Date: Wed, 25 May 2022 02:58:54 GMT
- Title: Investigating Information Inconsistency in Multilingual Open-Domain
Question Answering
- Authors: Shramay Palta, Haozhe An, Yifan Yang, Shuaiyi Huang, Maharshi Gor
- Abstract summary: We analyze the behavior of multilingual open-domain question answering models with a focus on retrieval bias.
We speculate that the content differences in documents across languages might reflect cultural divergences and/or social biases.
- Score: 18.23417521199809
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Retrieval based open-domain QA systems use retrieved documents and
answer-span selection over retrieved documents to find best-answer candidates.
We hypothesize that multilingual Question Answering (QA) systems are prone to
information inconsistency when it comes to documents written in different
languages, because these documents tend to provide a model with varying
information about the same topic. To understand the effects of the biased
availability of information and cultural influence, we analyze the behavior of
multilingual open-domain question answering models with a focus on retrieval
bias. We analyze if different retriever models present different passages given
the same question in different languages on TyDi QA and XOR-TyDi QA, two
multilingualQA datasets. We speculate that the content differences in documents
across languages might reflect cultural divergences and/or social biases.
Related papers
- Evaluating and Modeling Attribution for Cross-Lingual Question Answering [80.4807682093432]
This work is the first to study attribution for cross-lingual question answering.
We collect data in 5 languages to assess the attribution level of a state-of-the-art cross-lingual QA system.
We find that a substantial portion of the answers is not attributable to any retrieved passages.
arXiv Detail & Related papers (2023-05-23T17:57:46Z) - Applying Multilingual Models to Question Answering (QA) [0.0]
We study the performance of monolingual and multilingual language models on the task of question-answering (QA) on three diverse languages: English, Finnish and Japanese.
We develop models for the tasks of (1) determining if a question is answerable given the context and (2) identifying the answer texts within the context using IOB tagging.
arXiv Detail & Related papers (2022-12-04T21:58:33Z) - Discourse Analysis via Questions and Answers: Parsing Dependency
Structures of Questions Under Discussion [57.43781399856913]
This work adopts the linguistic framework of Questions Under Discussion (QUD) for discourse analysis.
We characterize relationships between sentences as free-form questions, in contrast to exhaustive fine-grained questions.
We develop the first-of-its-kind QUD that derives a dependency structure of questions over full documents.
arXiv Detail & Related papers (2022-10-12T03:53:12Z) - Towards Complex Document Understanding By Discrete Reasoning [77.91722463958743]
Document Visual Question Answering (VQA) aims to understand visually-rich documents to answer questions in natural language.
We introduce a new Document VQA dataset, named TAT-DQA, which consists of 3,067 document pages and 16,558 question-answer pairs.
We develop a novel model named MHST that takes into account the information in multi-modalities, including text, layout and visual image, to intelligently address different types of questions.
arXiv Detail & Related papers (2022-07-25T01:43:19Z) - Delving Deeper into Cross-lingual Visual Question Answering [115.16614806717341]
We show that simple modifications to the standard training setup can substantially reduce the transfer gap to monolingual English performance.
We analyze cross-lingual VQA across different question types of varying complexity for different multilingual multimodal Transformers.
arXiv Detail & Related papers (2022-02-15T18:22:18Z) - MFAQ: a Multilingual FAQ Dataset [9.625301186732598]
We present the first multilingual FAQ dataset publicly available.
We collected around 6M FAQ pairs from the web, in 21 different languages.
We adopt a similar setup as Dense Passage Retrieval (DPR) and test various bi-encoders on this dataset.
arXiv Detail & Related papers (2021-09-27T08:43:25Z) - One Question Answering Model for Many Languages with Cross-lingual Dense
Passage Retrieval [39.061900747689094]
CORA is a Cross-lingual Open-Retrieval Answer Generation model.
It can answer questions across many languages even when language-specific annotated data or knowledge sources are unavailable.
arXiv Detail & Related papers (2021-07-26T06:02:54Z) - Multilingual Answer Sentence Reranking via Automatically Translated Data [97.98885151955467]
We present a study on the design of multilingual Answer Sentence Selection (AS2) models, which are a core component of modern Question Answering (QA) systems.
The main idea is to transfer data, created from one resource rich language, e.g., English, to other languages, less rich in terms of resources.
arXiv Detail & Related papers (2021-02-20T03:52:08Z) - XOR QA: Cross-lingual Open-Retrieval Question Answering [75.20578121267411]
This work extends open-retrieval question answering to a cross-lingual setting.
We construct a large-scale dataset built on questions lacking same-language answers.
arXiv Detail & Related papers (2020-10-22T16:47:17Z) - TyDi QA: A Benchmark for Information-Seeking Question Answering in
Typologically Diverse Languages [27.588857710802113]
TyDi QA is a question answering dataset covering 11 typologically diverse languages with 204K question-answer pairs.
We present a quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena.
arXiv Detail & Related papers (2020-03-10T21:11:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.