Overview of GUA-SPA at IberLEF 2023: Guarani-Spanish Code Switching
Analysis
- URL: http://arxiv.org/abs/2309.06163v1
- Date: Tue, 12 Sep 2023 12:18:18 GMT
- Title: Overview of GUA-SPA at IberLEF 2023: Guarani-Spanish Code Switching
Analysis
- Authors: Luis Chiruzzo, Marvin Ag\"uero-Torales, Gustavo Gim\'enez-Lugo, Aldo
Alvarez, Yliana Rodr\'iguez, Santiago G\'ongora, Thamar Solorio
- Abstract summary: We present the first shared task for detecting and analyzing code-switching in Guarani and Spanish, GUA-SPA at IberLEF 2023.
The challenge consisted of three tasks: identifying the language of a token, NER, and a novel task of classifying the way a Spanish span is used in the code-switched context.
- Score: 5.262834474543783
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the first shared task for detecting and analyzing code-switching
in Guarani and Spanish, GUA-SPA at IberLEF 2023. The challenge consisted of
three tasks: identifying the language of a token, NER, and a novel task of
classifying the way a Spanish span is used in the code-switched context. We
annotated a corpus of 1500 texts extracted from news articles and tweets,
around 25 thousand tokens, with the information for the tasks. Three teams took
part in the evaluation phase, obtaining in general good results for Task 1, and
more mixed results for Tasks 2 and 3.
Related papers
- CardiffNLP at CLEARS-2025: Prompting Large Language Models for Plain Language and Easy-to-Read Text Rewriting [49.4237054647147]
This paper details the CardiffNLP team's contribution to the CLEARS shared task on Spanish text adaptation.<n>We detail our numerous prompt variations, examples, and experimental results.
arXiv Detail & Related papers (2025-08-05T09:16:19Z) - MSA at SemEval-2025 Task 3: High Quality Weak Labeling and LLM Ensemble Verification for Multilingual Hallucination Detection [0.0]
This paper describes our submission for SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes.<n>The task involves detecting hallucinated spans in text generated by instruction-tuned Large Language Models (LLMs) across multiple languages.<n>Our system ranked 1st in Arabic and Basque, 2nd in German, Swedish, and Finnish, and 3rd in Czech, Farsi, and French.
arXiv Detail & Related papers (2025-05-27T08:26:17Z) - SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection [76.18321723846616]
Task covers more than 30 languages from seven distinct language families.
Data instances are multi-labeled with six emotional classes, with additional datasets in 11 languages annotated for emotion intensity.
Participants were asked to predict labels in three tracks: (a) multilabel emotion detection, (b) emotion intensity score detection, and (c) cross-lingual emotion detection.
arXiv Detail & Related papers (2025-03-10T12:49:31Z) - GenAI Content Detection Task 1: English and Multilingual Machine-Generated Text Detection: AI vs. Human [71.42669028683741]
We present a shared task on binary machine generated text detection conducted as a part of the GenAI workshop at COLING 2025.
The task consists of two subtasks: Monolingual (English) and Multilingual.
We provide a comprehensive overview of the data, a summary of the results, detailed descriptions of the participating systems, and an in-depth analysis of submissions.
arXiv Detail & Related papers (2025-01-19T11:11:55Z) - USTCCTSU at SemEval-2024 Task 1: Reducing Anisotropy for Cross-lingual Semantic Textual Relatedness Task [17.905282052666333]
Cross-lingual semantic textual relatedness task is an important research task that addresses challenges in cross-lingual communication and text understanding.
It helps establish semantic connections between different languages, crucial for downstream tasks like machine translation, multilingual information retrieval, and cross-lingual text understanding.
With our approach, we achieve a 2nd score in Spanish, a 3rd in Indonesian, and multiple entries in the top ten results in the competition's track C.
arXiv Detail & Related papers (2024-11-28T08:40:14Z) - SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection [68.858931667807]
Subtask A is a binary classification task determining whether a text is written by a human or generated by a machine.
Subtask B is to detect the exact source of a text, discerning whether it is written by a human or generated by a specific LLM.
Subtask C aims to identify the changing point within a text, at which the authorship transitions from human to machine.
arXiv Detail & Related papers (2024-04-22T13:56:07Z) - Overview of AuTexTification at IberLEF 2023: Detection and Attribution
of Machine-Generated Text in Multiple Domains [6.44756483013808]
This paper presents the overview of the AuTexTification task as part of the IberLEF 2023 Workshop in Iberian Languages Evaluation Forum.
Our AuTexTification dataset contains more than 160.000 texts across two languages (English and Spanish) and five domains (tweets, reviews, news, legal, and how-to articles)
A total of 114 teams signed up to participate, of which 36 sent 175 runs, and 20 of them sent their working notes.
arXiv Detail & Related papers (2023-09-20T13:10:06Z) - UPB at IberLEF-2023 AuTexTification: Detection of Machine-Generated Text
using Transformer Ensembles [0.5324802812881543]
This paper describes the solutions submitted by the UPB team to the AuTexTification shared task, featured as part of IberLEF-2023.
Our best-performing model achieved macro F1-scores of 66.63% on the English dataset and 67.10% on the Spanish dataset.
arXiv Detail & Related papers (2023-08-02T20:08:59Z) - ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich
Document Images [198.35937007558078]
The competition opened on 30th December, 2022 and closed on 24th March, 2023.
There are 35 participants and 91 valid submissions received for Track 1, and 15 participants and 26 valid submissions received for Track 2.
According to the performance of the submissions, we believe there is still a large gap on the expected information extraction performance for complex and zero-shot scenarios.
arXiv Detail & Related papers (2023-06-05T22:20:52Z) - Enhancing Translation for Indigenous Languages: Experiments with
Multilingual Models [57.10972566048735]
We present the system descriptions for three methods.
We used two multilingual models, namely M2M-100 and mBART50, and one bilingual (one-to-one) -- Helsinki NLP Spanish-English translation model.
We experimented with 11 languages from America and report the setups we used as well as the results we achieved.
arXiv Detail & Related papers (2023-05-27T08:10:40Z) - Advancing Multilingual Pre-training: TRIP Triangular Document-level
Pre-training for Multilingual Language Models [107.83158521848372]
We present textbfTriangular Document-level textbfPre-training (textbfTRIP), which is the first in the field to accelerate the conventional monolingual and bilingual objectives into a trilingual objective with a novel method called Grafting.
TRIP achieves several strong state-of-the-art (SOTA) scores on three multilingual document-level machine translation benchmarks and one cross-lingual abstractive summarization benchmark, including consistent improvements by up to 3.11 d-BLEU points and 8.9 ROUGE-L points.
arXiv Detail & Related papers (2022-12-15T12:14:25Z) - LSCDiscovery: A shared task on semantic change discovery and detection
in Spanish [12.85253662018234]
We present the first shared task on semantic change discovery and detection in Spanish.
We create the first dataset of Spanish words manually annotated for semantic change using the DURel framework.
We describe the systems developed by the competing teams, highlighting the techniques that were particularly useful and discuss the limits of these approaches.
arXiv Detail & Related papers (2022-05-13T14:52:18Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Overview of ADoBo 2021: Automatic Detection of Unassimilated Borrowings
in the Spanish Press [8.950918531231158]
This paper summarizes the main findings of the ADoBo 2021 shared task, proposed in the context of IberLef 2021.
In this task, we invited participants to detect lexical borrowings (coming mostly from English) in Spanish newswire texts.
We provided participants with an annotated corpus of lexical borrowings which we split into training, development and test splits.
arXiv Detail & Related papers (2021-10-29T11:07:59Z) - Handshakes AI Research at CASE 2021 Task 1: Exploring different
approaches for multilingual tasks [0.22940141855172036]
The aim of the CASE 2021 Shared Task 1 was to detect and classify socio-political and crisis event information in a multilingual setting.
Our submission contained entries in all of the subtasks, and the scores obtained validated our research finding.
arXiv Detail & Related papers (2021-10-29T07:58:49Z) - ESPnet-ST IWSLT 2021 Offline Speech Translation System [56.83606198051871]
This paper describes the ESPnet-ST group's IWSLT 2021 submission in the offline speech translation track.
This year we made various efforts on training data, architecture, and audio segmentation.
Our best E2E system combined all the techniques with model ensembling and achieved 31.4 BLEU.
arXiv Detail & Related papers (2021-07-01T17:49:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.