Findings of the TSAR-2022 Shared Task on Multilingual Lexical
Simplification
- URL: http://arxiv.org/abs/2302.02888v1
- Date: Mon, 6 Feb 2023 15:53:51 GMT
- Title: Findings of the TSAR-2022 Shared Task on Multilingual Lexical
Simplification
- Authors: Horacio Saggion, Sanja \v{S}tajner, Daniel Ferr\'es, Kim Cheng Sheang,
Matthew Shardlow, Kai North, Marcos Zampieri
- Abstract summary: The TSAR-2022 shared task was organized as part of the Workshop on Text Simplification, Accessibility, and Readability TSAR-2022 held in conjunction with EMNLP 2022.
The task called the Natural Language Processing research community to contribute with methods to advance the state of the art in multilingual lexical simplification for English, Portuguese, and Spanish.
Results of the shared task indicate new benchmarks in Lexical Simplification with English lexical simplification quantitative results noticeably higher than those obtained for Spanish and (Brazilian) Portuguese.
- Score: 12.33631648094732
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We report findings of the TSAR-2022 shared task on multilingual lexical
simplification, organized as part of the Workshop on Text Simplification,
Accessibility, and Readability TSAR-2022 held in conjunction with EMNLP 2022.
The task called the Natural Language Processing research community to
contribute with methods to advance the state of the art in multilingual lexical
simplification for English, Portuguese, and Spanish. A total of 14 teams
submitted the results of their lexical simplification systems for the provided
test data. Results of the shared task indicate new benchmarks in Lexical
Simplification with English lexical simplification quantitative results
noticeably higher than those obtained for Spanish and (Brazilian) Portuguese.
Related papers
- Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning [47.75550640881761]
We explore cross-lingual generalization in instruction tuning by applying it to non-English tasks.
We design cross-lingual templates to mitigate discrepancies in language and instruction-format of the template between training and inference.
Our experiments reveal consistent improvements through cross-lingual generalization in both English and Korean.
arXiv Detail & Related papers (2024-06-13T04:10:17Z) - Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation
over More Languages and Beyond [89.54151859266202]
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework.
The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages.
The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks.
arXiv Detail & Related papers (2023-10-09T08:30:01Z) - KIT's Multilingual Speech Translation System for IWSLT 2023 [58.5152569458259]
We describe our speech translation system for the multilingual track of IWSLT 2023.
The task requires translation into 10 languages of varying amounts of resources.
Our cascaded speech system substantially outperforms its end-to-end counterpart on scientific talk translation.
arXiv Detail & Related papers (2023-06-08T16:13:20Z) - UCAS-IIE-NLP at SemEval-2023 Task 12: Enhancing Generalization of
Multilingual BERT for Low-resource Sentiment Analysis [24.542445315345464]
This paper describes our system designed for SemEval-2023 Task 12: Sentiment analysis for African languages.
Specifically, we design a lexicon-based multilingual BERT to facilitate language adaptation and sentiment-aware representation learning.
Our system achieved competitive results, largely outperforming baselines on both multilingual and zero-shot sentiment classification subtasks.
arXiv Detail & Related papers (2023-06-01T19:10:09Z) - Revisiting non-English Text Simplification: A Unified Multilingual
Benchmark [14.891068432456262]
This paper introduces the MultiSim benchmark, a collection of 27 resources in 12 distinct languages containing over 1.7 million complex-simple sentence pairs.
Our experiments using MultiSim with pre-trained multilingual language models reveal exciting performance improvements from multilingual training in non-English settings.
arXiv Detail & Related papers (2023-05-25T03:03:29Z) - ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text
Translation [79.66359274050885]
We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models.
Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks.
arXiv Detail & Related papers (2023-05-24T07:42:15Z) - Multilingual Simplification of Medical Texts [49.469685530201716]
We introduce MultiCochrane, the first sentence-aligned multilingual text simplification dataset for the medical domain in four languages.
We evaluate fine-tuned and zero-shot models across these languages, with extensive human assessments and analyses.
Although models can now generate viable simplified texts, we identify outstanding challenges that this dataset might be used to address.
arXiv Detail & Related papers (2023-05-21T18:25:07Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - UniHD at TSAR-2022 Shared Task: Is Compute All We Need for Lexical
Simplification? [2.931632009516441]
We describe a pipeline based on prompted GPT-3 responses, beating competing approaches by a wide margin in settings with few training instances.
Applying to the Spanish and Portuguese subset, we achieve state-of-the-art results with only minor modification to the original prompts.
arXiv Detail & Related papers (2023-01-04T18:59:20Z) - MANTIS at TSAR-2022 Shared Task: Improved Unsupervised Lexical
Simplification with Pretrained Encoders [31.64341800095214]
We present our contribution to the TSAR-2022 Shared Task on Lexical Simplification of the EMNLP 2022 Workshop on Text Simplification, Accessibility, and Readability.
Our approach builds on and extends the unsupervised lexical simplification system with pretrained encoders (LSBert) system.
Our best-performing system improves LSBert by 5.9% accuracy and second place out of 33 ranked solutions.
arXiv Detail & Related papers (2022-12-19T20:57:45Z) - Lexical Simplification Benchmarks for English, Portuguese, and Spanish [23.90236014260585]
We present a new benchmark dataset for lexical simplification in English, Spanish, and (Brazilian) Portuguese.
This is the first dataset that offers a direct comparison of lexical simplification systems for three languages.
We find a state-of-the-art neural lexical simplification system outperforms a state-of-the-art non-neural lexical simplification system in all three languages.
arXiv Detail & Related papers (2022-09-12T15:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.