xPQA: Cross-Lingual Product Question Answering across 12 Languages
- URL: http://arxiv.org/abs/2305.09249v1
- Date: Tue, 16 May 2023 07:56:19 GMT
- Title: xPQA: Cross-Lingual Product Question Answering across 12 Languages
- Authors: Xiaoyu Shen, Akari Asai, Bill Byrne and Adri\`a de Gispert
- Abstract summary: Product Question Answering (PQA) systems are key in e-commerce applications to provide responses to customers' questions.
xPQA is a large-scale annotated cross-lingual PQA dataset in 12 languages across 9 branches.
We present results in (1) candidate ranking, to select the best English candidate to answer a non-English question; and (2) answer generation, to generate a natural-sounding non-English answer based on the selected English candidate.
- Score: 26.691856403891105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Product Question Answering (PQA) systems are key in e-commerce applications
to provide responses to customers' questions as they shop for products. While
existing work on PQA focuses mainly on English, in practice there is need to
support multiple customer languages while leveraging product information
available in English. To study this practical industrial task, we present xPQA,
a large-scale annotated cross-lingual PQA dataset in 12 languages across 9
branches, and report results in (1) candidate ranking, to select the best
English candidate containing the information to answer a non-English question;
and (2) answer generation, to generate a natural-sounding non-English answer
based on the selected English candidate. We evaluate various approaches
involving machine translation at runtime or offline, leveraging multilingual
pre-trained LMs, and including or excluding xPQA training data. We find that
(1) In-domain data is essential as cross-lingual rankers trained on other
domains perform poorly on the PQA task; (2) Candidate ranking often prefers
runtime-translation approaches while answer generation prefers multilingual
approaches; (3) Translating offline to augment multilingual models helps
candidate ranking mainly on languages with non-Latin scripts; and helps answer
generation mainly on languages with Latin scripts. Still, there remains a
significant performance gap between the English and the cross-lingual test
sets.
Related papers
- Cross-lingual Transfer for Automatic Question Generation by Learning Interrogative Structures in Target Languages [6.635572580071933]
We propose a simple and efficient XLT-QG method that operates without the need for monolingual, parallel, or labeled data in the target language.
Our method achieves performance comparable to GPT-3.5-turbo across different languages.
arXiv Detail & Related papers (2024-10-04T07:29:35Z) - INDIC QA BENCHMARK: A Multilingual Benchmark to Evaluate Question Answering capability of LLMs for Indic Languages [26.13077589552484]
Indic-QA is the largest publicly available context-grounded question-answering dataset for 11 major Indian languages from two language families.
We generate a synthetic dataset using the Gemini model to create question-answer pairs given a passage, which is then manually verified for quality assurance.
We evaluate various multilingual Large Language Models and their instruction-fine-tuned variants on the benchmark and observe that their performance is subpar, particularly for low-resource languages.
arXiv Detail & Related papers (2024-07-18T13:57:16Z) - Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - Applying Multilingual Models to Question Answering (QA) [0.0]
We study the performance of monolingual and multilingual language models on the task of question-answering (QA) on three diverse languages: English, Finnish and Japanese.
We develop models for the tasks of (1) determining if a question is answerable given the context and (2) identifying the answer texts within the context using IOB tagging.
arXiv Detail & Related papers (2022-12-04T21:58:33Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Delving Deeper into Cross-lingual Visual Question Answering [115.16614806717341]
We show that simple modifications to the standard training setup can substantially reduce the transfer gap to monolingual English performance.
We analyze cross-lingual VQA across different question types of varying complexity for different multilingual multimodal Transformers.
arXiv Detail & Related papers (2022-02-15T18:22:18Z) - Multilingual Transfer Learning for QA Using Translation as Data
Augmentation [13.434957024596898]
We explore strategies that improve cross-lingual transfer by bringing the multilingual embeddings closer in the semantic space.
We propose two novel strategies, language adversarial training and language arbitration framework, which significantly improve the (zero-resource) cross-lingual transfer performance.
Empirically, we show that the proposed models outperform the previous zero-shot baseline on the recently introduced multilingual MLQA and TyDiQA datasets.
arXiv Detail & Related papers (2020-12-10T20:29:34Z) - XOR QA: Cross-lingual Open-Retrieval Question Answering [75.20578121267411]
This work extends open-retrieval question answering to a cross-lingual setting.
We construct a large-scale dataset built on questions lacking same-language answers.
arXiv Detail & Related papers (2020-10-22T16:47:17Z) - XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training,
Understanding and Generation [100.09099800591822]
XGLUE is a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models.
XGLUE provides 11 diversified tasks that cover both natural language understanding and generation scenarios.
arXiv Detail & Related papers (2020-04-03T07:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.