Quinductor: a multilingual data-driven method for generating
reading-comprehension questions using Universal Dependencies
- URL: http://arxiv.org/abs/2103.10121v3
- Date: Fri, 12 May 2023 18:57:33 GMT
- Title: Quinductor: a multilingual data-driven method for generating
reading-comprehension questions using Universal Dependencies
- Authors: Dmytro Kalpakchi and Johan Boye
- Abstract summary: We propose a multilingual data-driven method for generating reading comprehension questions using dependency trees.
Our method provides a strong, mostly deterministic, and inexpensive-to-train baseline for less-resourced languages.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose a multilingual data-driven method for generating reading
comprehension questions using dependency trees. Our method provides a strong,
mostly deterministic, and inexpensive-to-train baseline for less-resourced
languages. While a language-specific corpus is still required, its size is
nowhere near those required by modern neural question generation (QG)
architectures. Our method surpasses QG baselines previously reported in the
literature and shows a good performance in terms of human evaluation.
Related papers
- Cross-lingual Transfer Learning for Javanese Dependency Parsing [0.20537467311538835]
This study focuses on assessing the efficacy of transfer learning in enhancing dependency parsing for Javanese.
We utilize the Universal Dependencies dataset consisting of dependency treebanks from more than 100 languages, including Javanese.
arXiv Detail & Related papers (2024-01-22T16:13:45Z) - Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets.
This survey delves into an important attribute of these datasets: the dialect of a language.
Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z) - The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants [80.4837840962273]
We present Belebele, a dataset spanning 122 language variants.
This dataset enables the evaluation of text models in high-, medium-, and low-resource languages.
arXiv Detail & Related papers (2023-08-31T17:43:08Z) - Cross-Lingual Question Answering over Knowledge Base as Reading
Comprehension [61.079852289005025]
Cross-lingual question answering over knowledge base (xKBQA) aims to answer questions in languages different from that of the provided knowledge base.
One of the major challenges facing xKBQA is the high cost of data annotation.
We propose a novel approach for xKBQA in a reading comprehension paradigm.
arXiv Detail & Related papers (2023-02-26T05:52:52Z) - Cross-Lingual GenQA: A Language-Agnostic Generative Question Answering
Approach for Open-Domain Question Answering [76.99585451345702]
Open-Retrieval Generative Question Answering (GenQA) is proven to deliver high-quality, natural-sounding answers in English.
We present the first generalization of the GenQA approach for the multilingual environment.
arXiv Detail & Related papers (2021-10-14T04:36:29Z) - One Question Answering Model for Many Languages with Cross-lingual Dense
Passage Retrieval [39.061900747689094]
CORA is a Cross-lingual Open-Retrieval Answer Generation model.
It can answer questions across many languages even when language-specific annotated data or knowledge sources are unavailable.
arXiv Detail & Related papers (2021-07-26T06:02:54Z) - Improving Cross-Lingual Reading Comprehension with Self-Training [62.73937175625953]
Current state-of-the-art models even surpass human performance on several benchmarks.
Previous works have revealed the abilities of pre-trained multilingual models for zero-shot cross-lingual reading comprehension.
This paper further utilized unlabeled data to improve the performance.
arXiv Detail & Related papers (2021-05-08T08:04:30Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z) - A Hybrid Approach to Dependency Parsing: Combining Rules and Morphology
with Deep Learning [0.0]
We propose two approaches to dependency parsing especially for languages with restricted amount of training data.
Our first approach combines a state-of-the-art deep learning-based with a rule-based approach and the second one incorporates morphological information into the network.
The proposed methods are developed for Turkish, but can be adapted to other languages as well.
arXiv Detail & Related papers (2020-02-24T08:34:33Z) - How Much Knowledge Can You Pack Into the Parameters of a Language Model? [44.81324633069311]
It has been observed that neural language models trained on unstructured text can implicitly store and retrieve knowledge using natural language queries.
We measure the practical utility of this approach by fine-tuning pre-trained models to answer questions without access to any external context or knowledge.
arXiv Detail & Related papers (2020-02-10T18:55:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.