Related papers: Revisiting non-English Text Simplification: A Unified Multilingual Benchmark

Revisiting non-English Text Simplification: A Unified Multilingual Benchmark

URL: http://arxiv.org/abs/2305.15678v1
Date: Thu, 25 May 2023 03:03:29 GMT
Title: Revisiting non-English Text Simplification: A Unified Multilingual Benchmark
Authors: Michael J. Ryan, Tarek Naous, Wei Xu
Abstract summary: This paper introduces the MultiSim benchmark, a collection of 27 resources in 12 distinct languages containing over 1.7 million complex-simple sentence pairs. Our experiments using MultiSim with pre-trained multilingual language models reveal exciting performance improvements from multilingual training in non-English settings.
Score: 14.891068432456262
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in high-quality, large-scale English resources have pushed the frontier of English Automatic Text Simplification (ATS) research. However, less work has been done on multilingual text simplification due to the lack of a diverse evaluation benchmark that covers complex-simple sentence pairs in many languages. This paper introduces the MultiSim benchmark, a collection of 27 resources in 12 distinct languages containing over 1.7 million complex-simple sentence pairs. This benchmark will encourage research in developing more effective multilingual text simplification models and evaluation metrics. Our experiments using MultiSim with pre-trained multilingual language models reveal exciting performance improvements from multilingual training in non-English settings. We observe strong performance from Russian in zero-shot cross-lingual transfer to low-resource languages. We further show that few-shot prompting with BLOOM-176b achieves comparable quality to reference simplifications outperforming fine-tuned models in most languages. We validate these findings through human evaluation.

Related papers

MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining [27.952041404675846]
We introduce MuRating, a framework that transfers high-quality English data-quality signals into a single rater for 17 target languages.<n>MuRating aggregates multiple English "raters" via pairwise comparisons to learn unified document-quality scores.<n>It then projects these judgments through translation to train a multilingual evaluator on monolingual, cross-lingual, and parallel text pairs.
arXiv Detail & Related papers (2025-07-02T15:11:12Z)
MuBench: Assessment of Multilingual Capabilities of Large Language Models Across 61 Languages [33.450081592217074]
We introduce MuBench, a benchmark covering 61 languages and evaluating a broad range of capabilities.<n>We evaluate several state-of-the-art multilingual LLMs and find notable gaps between claimed and actual language coverage.
arXiv Detail & Related papers (2025-06-24T09:53:00Z)
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation [60.52580061637301]
MMLU-ProX is a comprehensive benchmark covering 13 typologically diverse languages with approximately 11,829 questions per language. We evaluate 25 state-of-the-art large language models (LLMs) using 5-shot chain-of-thought (CoT) and zero-shot prompting strategies, analyzing their performance across linguistic and cultural boundaries. Our experiments reveal consistent performance degradation from high-resource languages to lower-resource ones, with the best models achieving over 70% accuracy on English but dropping to around 40% for languages like Swahili.
arXiv Detail & Related papers (2025-03-13T15:59:20Z)
mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval [61.17793165194077]
We introduce mFollowIR, a benchmark for measuring instruction-following ability in retrieval models. We present results for both multilingual (XX-XX) and cross-lingual (En-XX) performance. We see strong cross-lingual performance with English-based retrievers that trained using instructions, but find a notable drop in performance in the multilingual setting.
arXiv Detail & Related papers (2025-01-31T16:24:46Z)
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model [66.17354128553244]
Most Large Vision-Language Models (LVLMs) to date are trained predominantly on English data. We investigate how different training mixes tip the scale for different groups of languages. We train Centurio, a 100-language LVLM, offering state-of-the-art performance in an evaluation covering 14 tasks and 56 languages.
arXiv Detail & Related papers (2025-01-09T10:26:14Z)
Synergistic Approach for Simultaneous Optimization of Monolingual, Cross-lingual, and Multilingual Information Retrieval [5.446052898856584]
This paper proposes a novel hybrid batch training strategy to improve zero-shot retrieval performance across monolingual, cross-lingual, and multilingual settings. The approach fine-tunes multilingual language models using a mix of monolingual and cross-lingual question-answer pair batches sampled based on dataset size.
arXiv Detail & Related papers (2024-08-20T04:30:26Z)
Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model [14.39119862985503]
We aim to create a multilingual ALT system with available datasets. Inspired by architectures that have been proven effective for English ALT, we adapt these techniques to the multilingual scenario. We evaluate the performance of the multilingual model in comparison to its monolingual counterparts.
arXiv Detail & Related papers (2024-06-25T15:02:32Z)
Multi-Task Contrastive Learning for 8192-Token Bilingual Text Embeddings [22.71166607645311]
We introduce a novel suite of state-of-the-art bilingual text embedding models. These models are capable of processing lengthy text inputs with up to 8192 tokens. We have significantly improved the model performance on STS tasks. We have expanded the Massive Text Embedding Benchmark to include benchmarks for German and Spanish embedding models.
arXiv Detail & Related papers (2024-02-26T20:53:12Z)
Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations [59.056367787688146]
This paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs. We construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages. By utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
arXiv Detail & Related papers (2023-10-31T08:09:20Z)
MULTITuDE: Large-Scale Multilingual Machine-Generated Text Detection Benchmark [10.92793962395538]
MultiTuDE is a novel benchmarking dataset for multilingual machine-generated text detection. It consists of 74,081 authentic and machine-generated texts in 11 languages. We compare the performance of zero-shot (statistical and black-box) and fine-tuned detectors.
arXiv Detail & Related papers (2023-10-20T15:57:17Z)
Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks. We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset. To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z)
Multilingual Relation Classification via Efficient and Effective Prompting [9.119073318043952]
We present the first work on prompt-based multilingual relation classification (RC) We introduce an efficient and effective method that constructs prompts from relation triples and involves only minimal translation for the class labels. We evaluate its performance in fully supervised, few-shot and zero-shot scenarios, and analyze its effectiveness across 14 languages.
arXiv Detail & Related papers (2022-10-25T08:40:23Z)
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark. IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z)
XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation [93.80733419450225]
This paper analyzes the current state of cross-lingual transfer learning. We extend XTREME to XTREME-R, which consists of an improved set of ten natural language understanding tasks.
arXiv Detail & Related papers (2021-04-15T12:26:12Z)
Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics. We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation [61.88012735215636]
Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs. UNMT can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time. In this paper, we empirically introduce a simple method to translate between thirteen languages using a single encoder and a single decoder.
arXiv Detail & Related papers (2020-04-21T17:26:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.