DTW at Qur'an QA 2022: Utilising Transfer Learning with Transformers for
Question Answering in a Low-resource Domain
- URL: http://arxiv.org/abs/2205.06025v1
- Date: Thu, 12 May 2022 11:17:23 GMT
- Title: DTW at Qur'an QA 2022: Utilising Transfer Learning with Transformers for
Question Answering in a Low-resource Domain
- Authors: Damith Premasiri, Tharindu Ranasinghe, Wajdi Zaghouani, Ruslan Mitkov
- Abstract summary: The research in machine reading comprehension has been understudied in several domains, including religious texts.
The goal of the Qur'an QA 2022 shared task is to fill this gap by producing state-of-the-art question answering and reading comprehension research on Qur'an.
- Score: 10.172732008860539
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of machine reading comprehension (MRC) is a useful benchmark to
evaluate the natural language understanding of machines. It has gained
popularity in the natural language processing (NLP) field mainly due to the
large number of datasets released for many languages. However, the research in
MRC has been understudied in several domains, including religious texts. The
goal of the Qur'an QA 2022 shared task is to fill this gap by producing
state-of-the-art question answering and reading comprehension research on
Qur'an. This paper describes the DTW entry to the Quran QA 2022 shared task.
Our methodology uses transfer learning to take advantage of available Arabic
MRC data. We further improve the results using various ensemble learning
strategies. Our approach provided a partial Reciprocal Rank (pRR) score of 0.49
on the test set, proving its strong performance on the task.
Related papers
- Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - TCE at Qur'an QA 2023 Shared Task: Low Resource Enhanced
Transformer-based Ensemble Approach for Qur'anic QA [0.0]
We present our approach to tackle Qur'an QA 2023 shared tasks A and B.
To address the challenge of low-resourced training data, we rely on transfer learning together with a voting ensemble.
We employ different architectures and learning mechanisms for a range of Arabic pre-trained transformer-based models for both tasks.
arXiv Detail & Related papers (2024-01-23T19:32:54Z) - Building Efficient and Effective OpenQA Systems for Low-Resource Languages [17.64851283209797]
We show that effective, low-cost OpenQA systems can be developed for low-resource contexts.
Key ingredients are weak supervision using machine-translated labeled datasets and a relevant unstructured knowledge source.
We present SQuAD-TR, a machine translation of SQuAD2.0, and we build our OpenQA system by adapting ColBERT-QA and retraining it over Turkish resources.
arXiv Detail & Related papers (2024-01-07T22:11:36Z) - Rethinking and Improving Multi-task Learning for End-to-end Speech
Translation [51.713683037303035]
We investigate the consistency between different tasks, considering different times and modules.
We find that the textual encoder primarily facilitates cross-modal conversion, but the presence of noise in speech impedes the consistency between text and speech representations.
We propose an improved multi-task learning (IMTL) approach for the ST task, which bridges the modal gap by mitigating the difference in length and representation.
arXiv Detail & Related papers (2023-11-07T08:48:46Z) - Unify word-level and span-level tasks: NJUNLP's Participation for the
WMT2023 Quality Estimation Shared Task [59.46906545506715]
We introduce the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task.
Our team submitted predictions for the English-German language pair on all two sub-tasks.
Our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks.
arXiv Detail & Related papers (2023-09-23T01:52:14Z) - Quran Recitation Recognition using End-to-End Deep Learning [0.0]
The Quran is the holy scripture of Islam, and its recitation is an important aspect of the religion.
Recognizing the recitation of the Holy Quran automatically is a challenging task due to its unique rules.
We propose a novel end-to-end deep learning model for recognizing the recitation of the Holy Quran.
arXiv Detail & Related papers (2023-05-10T18:40:01Z) - MIA 2022 Shared Task: Evaluating Cross-lingual Open-Retrieval Question
Answering for 16 Diverse Languages [54.002969723086075]
We evaluate cross-lingual open-retrieval question answering systems in 16 typologically diverse languages.
The best system leveraging iteratively mined diverse negative examples achieves 32.2 F1, outperforming our baseline by 4.5 points.
The second best system uses entity-aware contextualized representations for document retrieval, and achieves significant improvements in Tamil (20.8 F1), whereas most of the other systems yield nearly zero scores.
arXiv Detail & Related papers (2022-07-02T06:54:10Z) - TCE at Qur'an QA 2022: Arabic Language Question Answering Over Holy
Qur'an Using a Post-Processed Ensemble of BERT-based Models [0.0]
Arabic is the language of the Holy Qur'an; the sacred text for 1.8 billion people across the world.
We propose an ensemble learning model based on Arabic variants of BERT models.
Our system achieves a Partial Reciprocal Rank (pRR) score of 56.6% on the official test set.
arXiv Detail & Related papers (2022-06-03T13:00:48Z) - XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-based
Textual Knowledge Source [2.348805691644086]
This paper presents XLMRQA, the first Vietnamese QA system using a supervised transformer-based reader on the Wikipedia-based textual knowledge source.
From the results obtained on the three systems, we analyze the influence of question types on the performance of the QA systems.
arXiv Detail & Related papers (2022-04-14T14:54:33Z) - Enhancing Answer Boundary Detection for Multilingual Machine Reading
Comprehension [86.1617182312817]
We propose two auxiliary tasks in the fine-tuning stage to create additional phrase boundary supervision.
A mixed Machine Reading task, which translates the question or passage to other languages and builds cross-lingual question-passage pairs.
A language-agnostic knowledge masking task by leveraging knowledge phrases mined from web.
arXiv Detail & Related papers (2020-04-29T10:44:00Z) - A Sentence Cloze Dataset for Chinese Machine Reading Comprehension [64.07894249743767]
We propose a new task called Sentence Cloze-style Machine Reading (SC-MRC)
The proposed task aims to fill the right candidate sentence into the passage that has several blanks.
We built a Chinese dataset called CMRC 2019 to evaluate the difficulty of the SC-MRC task.
arXiv Detail & Related papers (2020-04-07T04:09:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.