BRUMS at SemEval-2020 Task 12 : Transformer based Multilingual Offensive
Language Identification in Social Media
- URL: http://arxiv.org/abs/2010.06278v1
- Date: Tue, 13 Oct 2020 10:39:14 GMT
- Title: BRUMS at SemEval-2020 Task 12 : Transformer based Multilingual Offensive
Language Identification in Social Media
- Authors: Tharindu Ranasinghe, Hansi Hettiarachchi
- Abstract summary: We present a multilingual deep learning model to identify offensive language in social media.
The approach achieves acceptable evaluation scores, while maintaining flexibility between languages.
- Score: 9.710464466895521
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we describe the team \textit{BRUMS} entry to OffensEval 2:
Multilingual Offensive Language Identification in Social Media in SemEval-2020.
The OffensEval organizers provided participants with annotated datasets
containing posts from social media in Arabic, Danish, English, Greek and
Turkish. We present a multilingual deep learning model to identify offensive
language in social media. Overall, the approach achieves acceptable evaluation
scores, while maintaining flexibility between languages.
Related papers
- Wav2Gloss: Generating Interlinear Glossed Text from Speech [78.64412090339044]
We propose Wav2Gloss, a task in which four linguistic annotation components are extracted automatically from speech.
We provide various baselines to lay the groundwork for future research on Interlinear Glossed Text generation from speech.
arXiv Detail & Related papers (2024-03-19T21:45:29Z) - ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text
Translation [79.66359274050885]
We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models.
Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks.
arXiv Detail & Related papers (2023-05-24T07:42:15Z) - Transformer-based Model for Word Level Language Identification in
Code-mixed Kannada-English Texts [55.41644538483948]
We propose the use of a Transformer based model for word-level language identification in code-mixed Kannada English texts.
The proposed model on the CoLI-Kenglish dataset achieves a weighted F1-score of 0.84 and a macro F1-score of 0.61.
arXiv Detail & Related papers (2022-11-26T02:39:19Z) - Exploring Teacher-Student Learning Approach for Multi-lingual
Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages.
We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z) - WLV-RIT at HASOC-Dravidian-CodeMix-FIRE2020: Offensive Language
Identification in Code-switched YouTube Comments [16.938836887702923]
This paper describes the WLV-RIT entry to the Hate Speech and Offensive Content Identification in Indo-European languages task 2020.
The HASOC 2020 organizers provided participants with datasets containing social media posts of code-mixed in Dravidian languages (Malayalam-English and Tamil-English)
Our system achieved 0.89 weighted average F1 score for the test set and it ranked 5th place out of 12 participants.
arXiv Detail & Related papers (2020-11-01T16:52:08Z) - UPB at SemEval-2020 Task 12: Multilingual Offensive Language Detection
on Social Media by Fine-tuning a Variety of BERT-based Models [0.0]
This paper describes our Transformer-based solutions for identifying offensive language on Twitter in five languages.
It was employed in Subtask A of the Offenseval 2020 shared task.
arXiv Detail & Related papers (2020-10-26T14:28:29Z) - Looking for Clues of Language in Multilingual BERT to Improve
Cross-lingual Generalization [56.87201892585477]
Token embeddings in multilingual BERT (m-BERT) contain both language and semantic information.
We control the output languages of multilingual BERT by manipulating the token embeddings.
arXiv Detail & Related papers (2020-10-20T05:41:35Z) - WOLI at SemEval-2020 Task 12: Arabic Offensive Language Identification
on Different Twitter Datasets [0.0]
A key to fight offensive language on social media is the existence of an automatic offensive language detection system.
In this paper, we describe the system submitted by WideBot AI Lab for the shared task which ranked 10th out of 52 participants with Macro-F1 86.9%.
We also introduced a neural network approach that enhanced the predictive ability of our system that includes CNN, highway network, Bi-LSTM, and attention layers.
arXiv Detail & Related papers (2020-09-11T14:10:03Z) - NLPDove at SemEval-2020 Task 12: Improving Offensive Language Detection
with Cross-lingual Transfer [10.007363787391952]
This paper describes our approach to the task of identifying offensive languages in a multilingual setting.
We investigate two data augmentation strategies: using additional semi-supervised labels with different thresholds and cross-lingual transfer with data selection.
Our multilingual systems achieved competitive results in Greek, Danish, and Turkish at OffensEval 2020.
arXiv Detail & Related papers (2020-08-04T06:20:50Z) - LIIR at SemEval-2020 Task 12: A Cross-Lingual Augmentation Approach for
Multilingual Offensive Language Identification [19.23116755449024]
We adapt and fine-tune the BERT and Multilingual Bert models made available by Google AI for English and non-English languages respectively.
For the English language, we use a combination of two fine-tuned BERT models.
For other languages we propose a cross-lingual augmentation approach in order to enrich training data and we use Multilingual BERT to obtain sentence representations.
arXiv Detail & Related papers (2020-05-07T18:45:48Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.