Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive
Language Identification using Pre-trained Language Models
- URL: http://arxiv.org/abs/2010.03542v1
- Date: Wed, 7 Oct 2020 17:40:19 GMT
- Title: Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive
Language Identification using Pre-trained Language Models
- Authors: Shuohuan Wang, Jiaxiang Liu, Xuan Ouyang, Yu Sun
- Abstract summary: This paper describes Galileo's performance in SemEval-2020 Task 12 on detecting and categorizing offensive language in social media.
For Offensive Language Identification, we proposed a multi-lingual method using Pre-trained Language Models, ERNIE and XLM-R.
For offensive language categorization, we proposed a knowledge distillation method trained on soft labels generated by several supervised models.
- Score: 11.868582973877626
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes Galileo's performance in SemEval-2020 Task 12 on
detecting and categorizing offensive language in social media. For Offensive
Language Identification, we proposed a multi-lingual method using Pre-trained
Language Models, ERNIE and XLM-R. For offensive language categorization, we
proposed a knowledge distillation method trained on soft labels generated by
several supervised models. Our team participated in all three sub-tasks. In
Sub-task A - Offensive Language Identification, we ranked first in terms of
average F1 scores in all languages. We are also the only team which ranked
among the top three across all languages. We also took the first place in
Sub-task B - Automatic Categorization of Offense Types and Sub-task C - Offence
Target Identification.
Related papers
- Wav2Gloss: Generating Interlinear Glossed Text from Speech [78.64412090339044]
We propose Wav2Gloss, a task in which four linguistic annotation components are extracted automatically from speech.
We provide various baselines to lay the groundwork for future research on Interlinear Glossed Text generation from speech.
arXiv Detail & Related papers (2024-03-19T21:45:29Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using
Afro-centric Language Models and Adapters for Low-resource African Languages [0.0]
The task aims to perform monolingual sentiment classification (sub-task A) for 12 African languages, multilingual sentiment classification (sub-task B) and zero-shot sentiment classification (task C)
Our findings suggest that using pre-trained Afro-centric language models improves performance for low-resource African languages.
We also ran experiments using adapters for zero-shot tasks, and the results suggest that we can obtain promising results by using adapters with a limited amount of resources.
arXiv Detail & Related papers (2023-04-13T12:54:29Z) - SheffieldVeraAI at SemEval-2023 Task 3: Mono and multilingual approaches
for news genre, topic and persuasion technique classification [3.503844033591702]
This paper describes our approach for SemEval-2023 Task 3: Detecting the category, the framing, and the persuasion techniques in online news in a multi-lingual setup.
arXiv Detail & Related papers (2023-03-16T15:54:23Z) - Bridging Cross-Lingual Gaps During Leveraging the Multilingual
Sequence-to-Sequence Pretraining for Text Generation [80.16548523140025]
We extend the vanilla pretrain-finetune pipeline with extra code-switching restore task to bridge the gap between the pretrain and finetune stages.
Our approach could narrow the cross-lingual sentence representation distance and improve low-frequency word translation with trivial computational cost.
arXiv Detail & Related papers (2022-04-16T16:08:38Z) - Intent Classification Using Pre-Trained Embeddings For Low Resource
Languages [67.40810139354028]
Building Spoken Language Understanding systems that do not rely on language specific Automatic Speech Recognition is an important yet less explored problem in language processing.
We present a comparative study aimed at employing a pre-trained acoustic model to perform Spoken Language Understanding in low resource scenarios.
We perform experiments across three different languages: English, Sinhala, and Tamil each with different data sizes to simulate high, medium, and low resource scenarios.
arXiv Detail & Related papers (2021-10-18T13:06:59Z) - Exploring Teacher-Student Learning Approach for Multi-lingual
Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages.
We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z) - Garain at SemEval-2020 Task 12: Sequence based Deep Learning for
Categorizing Offensive Language in Social Media [3.236217153362305]
SemEval-2020 Task 12 was OffenseEval: Multilingual Offensive Language Identification in Social Media.
My system on training on 25% of the whole dataset macro averaged f1 score of 47.763%.
arXiv Detail & Related papers (2020-09-02T17:09:29Z) - ANDES at SemEval-2020 Task 12: A jointly-trained BERT multilingual model
for offensive language detection [0.6445605125467572]
We jointly-trained a single model by fine-tuning Multilingual BERT to tackle the task across all the proposed languages.
Our single model had competitive results, with a performance close to top-performing systems.
arXiv Detail & Related papers (2020-08-13T16:07:00Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.