Learning to Answer Multilingual and Code-Mixed Questions
- URL: http://arxiv.org/abs/2211.07522v1
- Date: Mon, 14 Nov 2022 16:49:58 GMT
- Title: Learning to Answer Multilingual and Code-Mixed Questions
- Authors: Deepak Gupta
- Abstract summary: Question-answering (QA) that comes naturally to humans is a critical component in seamless human-computer interaction.
Despite being one of the oldest research areas, the current QA system faces the critical challenge of handling multilingual queries.
This dissertation focuses on advancing QA techniques for handling end-user queries in multilingual environments.
- Score: 4.290420179006601
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Question-answering (QA) that comes naturally to humans is a critical
component in seamless human-computer interaction. It has emerged as one of the
most convenient and natural methods to interact with the web and is especially
desirable in voice-controlled environments. Despite being one of the oldest
research areas, the current QA system faces the critical challenge of handling
multilingual queries. To build an Artificial Intelligent (AI) agent that can
serve multilingual end users, a QA system is required to be language versatile
and tailored to suit the multilingual environment. Recent advances in QA models
have enabled surpassing human performance primarily due to the availability of
a sizable amount of high-quality datasets. However, the majority of such
annotated datasets are expensive to create and are only confined to the English
language, making it challenging to acknowledge progress in foreign languages.
Therefore, to measure a similar improvement in the multilingual QA system, it
is necessary to invest in high-quality multilingual evaluation benchmarks. In
this dissertation, we focus on advancing QA techniques for handling end-user
queries in multilingual environments. This dissertation consists of two parts.
In the first part, we explore multilingualism and a new dimension of
multilingualism referred to as code-mixing. Second, we propose a technique to
solve the task of multi-hop question generation by exploiting multiple
documents. Experiments show our models achieve state-of-the-art performance on
answer extraction, ranking, and generation tasks on multiple domains of MQA,
VQA, and language generation. The proposed techniques are generic and can be
widely used in various domains and languages to advance QA systems.
Related papers
- Cross-lingual Transfer for Automatic Question Generation by Learning Interrogative Structures in Target Languages [6.635572580071933]
We propose a simple and efficient XLT-QG method that operates without the need for monolingual, parallel, or labeled data in the target language.
Our method achieves performance comparable to GPT-3.5-turbo across different languages.
arXiv Detail & Related papers (2024-10-04T07:29:35Z) - MST5 -- Multilingual Question Answering over Knowledge Graphs [1.6470999044938401]
Knowledge Graph Question Answering (KGQA) simplifies querying vast amounts of knowledge stored in a graph-based model using natural language.
Existing multilingual KGQA systems face challenges in achieving performance comparable to English systems.
We propose a simplified approach to enhance multilingual KGQA systems by incorporating linguistic context and entity information directly into the processing pipeline of a language model.
arXiv Detail & Related papers (2024-07-08T15:37:51Z) - MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering [58.92057773071854]
We introduce MTVQA, the first benchmark featuring high-quality human expert annotations across 9 diverse languages.
MTVQA is the first benchmark featuring high-quality human expert annotations across 9 diverse languages.
arXiv Detail & Related papers (2024-05-20T12:35:01Z) - Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - SEMQA: Semi-Extractive Multi-Source Question Answering [94.04430035121136]
We introduce a new QA task for answering multi-answer questions by summarizing multiple diverse sources in a semi-extractive fashion.
We create the first dataset of this kind, QuoteSum, with human-written semi-extractive answers to natural and generated questions.
arXiv Detail & Related papers (2023-11-08T18:46:32Z) - Bridging the Language Gap: Knowledge Injected Multilingual Question
Answering [19.768708263635176]
We propose a generalized cross-lingual transfer framework to enhance the model's ability to understand different languages.
Experiment results on real-world datasets MLQA demonstrate that the proposed method can improve the performance by a large margin.
arXiv Detail & Related papers (2023-04-06T15:41:25Z) - xGQA: Cross-Lingual Visual Question Answering [100.35229218735938]
xGQA is a new multilingual evaluation benchmark for the visual question answering task.
We extend the established English GQA dataset to 7 typologically diverse languages.
We propose new adapter-based approaches to adapt multimodal transformer-based models to become multilingual.
arXiv Detail & Related papers (2021-09-13T15:58:21Z) - Towards More Equitable Question Answering Systems: How Much More Data Do
You Need? [15.401330338654203]
We take a step back and study which approaches allow us to take the most advantage of existing resources in order to produce QA systems in many languages.
Specifically, we perform extensive analysis to measure the efficacy of few-shot approaches augmented with automatic translations and permutations of context-question-answer pairs.
We make suggestions for future dataset development efforts that make better use of a fixed annotation budget, with a goal of increasing the language coverage of QA datasets and systems.
arXiv Detail & Related papers (2021-05-28T21:32:04Z) - Crossing the Conversational Chasm: A Primer on Multilingual
Task-Oriented Dialogue Systems [51.328224222640614]
Current state-of-the-art ToD models based on large pretrained neural language models are data hungry.
Data acquisition for ToD use cases is expensive and tedious.
arXiv Detail & Related papers (2021-04-17T15:19:56Z) - Multilingual Answer Sentence Reranking via Automatically Translated Data [97.98885151955467]
We present a study on the design of multilingual Answer Sentence Selection (AS2) models, which are a core component of modern Question Answering (QA) systems.
The main idea is to transfer data, created from one resource rich language, e.g., English, to other languages, less rich in terms of resources.
arXiv Detail & Related papers (2021-02-20T03:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.