KInITVeraAI at SemEval-2023 Task 3: Simple yet Powerful Multilingual
Fine-Tuning for Persuasion Techniques Detection
- URL: http://arxiv.org/abs/2304.11924v1
- Date: Mon, 24 Apr 2023 09:06:43 GMT
- Title: KInITVeraAI at SemEval-2023 Task 3: Simple yet Powerful Multilingual
Fine-Tuning for Persuasion Techniques Detection
- Authors: Timo Hromadka, Timotej Smolen, Tomas Remis, Branislav Pecher, Ivan
Srba
- Abstract summary: This paper presents the best-performing solution to the SemEval 2023 Task 3 on the subtask 3 dedicated to persuasion techniques detection.
Due to a high multilingual character of the input data and a large number of 23 predicted labels, we opted for fine-tuning pre-trained transformer-based language models.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents the best-performing solution to the SemEval 2023 Task 3
on the subtask 3 dedicated to persuasion techniques detection. Due to a high
multilingual character of the input data and a large number of 23 predicted
labels (causing a lack of labelled data for some language-label combinations),
we opted for fine-tuning pre-trained transformer-based language models.
Conducting multiple experiments, we find the best configuration, which consists
of large multilingual model (XLM-RoBERTa large) trained jointly on all input
data, with carefully calibrated confidence thresholds for seen and surprise
languages separately. Our final system performed the best on 6 out of 9
languages (including two surprise languages) and achieved highly competitive
results on the remaining three languages.
Related papers
- KInIT at SemEval-2024 Task 8: Fine-tuned LLMs for Multilingual Machine-Generated Text Detection [0.0]
SemEval-2024 Task 8 is focused on multigenerator, multidomain, and multilingual black-box machine-generated text detection.
Our submitted method achieved competitive results, ranking at the fourth place, just under 1 percentage point behind the winner.
arXiv Detail & Related papers (2024-02-21T10:09:56Z) - Towards a Deep Understanding of Multilingual End-to-End Speech
Translation [52.26739715012842]
We analyze representations learnt in a multilingual end-to-end speech translation model trained over 22 languages.
We derive three major findings from our analysis.
arXiv Detail & Related papers (2023-10-31T13:50:55Z) - Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations [59.056367787688146]
This paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs.
We construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
By utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
arXiv Detail & Related papers (2023-10-31T08:09:20Z) - T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - MarsEclipse at SemEval-2023 Task 3: Multi-Lingual and Multi-Label
Framing Detection with Contrastive Learning [21.616089539381996]
This paper describes our system for SemEval-2023 Task 3 Subtask 2 on Framing Detection.
We used a multi-label contrastive loss for fine-tuning large pre-trained language models in a multi-lingual setting.
Our system was ranked first on the official test set and on the official shared task leaderboard for five of the six languages.
arXiv Detail & Related papers (2023-04-20T18:42:23Z) - Team QUST at SemEval-2023 Task 3: A Comprehensive Study of Monolingual
and Multilingual Approaches for Detecting Online News Genre, Framing and
Persuasion Techniques [0.030458514384586396]
This paper describes the participation of team QUST in the SemEval2023 task 3.
The monolingual models are first evaluated with the under-sampling of the majority classes.
The pre-trained multilingual model is fine-tuned with a combination of the class weights and the sample weights.
arXiv Detail & Related papers (2023-04-09T08:14:01Z) - Hitachi at SemEval-2023 Task 3: Exploring Cross-lingual Multi-task
Strategies for Genre and Framing Detection in Online News [10.435874177179764]
This paper explains the participation of team Hitachi to SemEval-2023 Task 3 "Detecting the genre, the framing, and the persuasion techniques in online news in a multi-lingual setup"
We investigated different cross-lingual and multi-task strategies for training the pretrained language models.
We constructed ensemble models from the results and achieved the highest macro-averaged F1 scores in Italian and Russian genre categorization subtasks.
arXiv Detail & Related papers (2023-03-03T09:12:55Z) - Enhancing Model Performance in Multilingual Information Retrieval with
Comprehensive Data Engineering Techniques [10.57012904999091]
We fine-tune pre-trained multilingual transformer-based models with MIRACL dataset.
Our model improvement is mainly achieved through diverse data engineering techniques.
We secure 2nd place in the Surprise-Languages track with a score of 0.835 and 3rd place in the Known-Languages track with an average nDCG@10 score of 0.716 across the 16 known languages on the final leaderboard.
arXiv Detail & Related papers (2023-02-14T12:37:32Z) - Are Multilingual Models Effective in Code-Switching? [57.78477547424949]
We study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting.
Our findings suggest that pre-trained multilingual models do not necessarily guarantee high-quality representations on code-switching.
arXiv Detail & Related papers (2021-03-24T16:20:02Z) - Knowledge Distillation for Multilingual Unsupervised Neural Machine
Translation [61.88012735215636]
Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs.
UNMT can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time.
In this paper, we empirically introduce a simple method to translate between thirteen languages using a single encoder and a single decoder.
arXiv Detail & Related papers (2020-04-21T17:26:16Z) - Balancing Training for Multilingual Neural Machine Translation [130.54253367251738]
multilingual machine translation (MT) models can translate to/from multiple languages.
Standard practice is to up-sample less resourced languages to increase representation.
We propose a method that instead automatically learns how to weight training data through a data scorer.
arXiv Detail & Related papers (2020-04-14T18:23:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.