LIIR at SemEval-2020 Task 12: A Cross-Lingual Augmentation Approach for
Multilingual Offensive Language Identification
- URL: http://arxiv.org/abs/2005.03695v2
- Date: Fri, 17 Jul 2020 11:55:25 GMT
- Title: LIIR at SemEval-2020 Task 12: A Cross-Lingual Augmentation Approach for
Multilingual Offensive Language Identification
- Authors: Erfan Ghadery, Marie-Francine Moens
- Abstract summary: We adapt and fine-tune the BERT and Multilingual Bert models made available by Google AI for English and non-English languages respectively.
For the English language, we use a combination of two fine-tuned BERT models.
For other languages we propose a cross-lingual augmentation approach in order to enrich training data and we use Multilingual BERT to obtain sentence representations.
- Score: 19.23116755449024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents our system entitled `LIIR' for SemEval-2020 Task 12 on
Multilingual Offensive Language Identification in Social Media (OffensEval 2).
We have participated in sub-task A for English, Danish, Greek, Arabic, and
Turkish languages. We adapt and fine-tune the BERT and Multilingual Bert models
made available by Google AI for English and non-English languages respectively.
For the English language, we use a combination of two fine-tuned BERT models.
For other languages we propose a cross-lingual augmentation approach in order
to enrich training data and we use Multilingual BERT to obtain sentence
representations. LIIR achieved rank 14/38, 18/47, 24/86, 24/54, and 25/40 in
Greek, Turkish, English, Arabic, and Danish languages, respectively.
Related papers
- ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text
Translation [79.66359274050885]
We present ComSL, a speech-language model built atop a composite architecture of public pretrained speech-only and language-only models.
Our approach has demonstrated effectiveness in end-to-end speech-to-text translation tasks.
arXiv Detail & Related papers (2023-05-24T07:42:15Z) - Exploring Teacher-Student Learning Approach for Multi-lingual
Speech-to-Intent Classification [73.5497360800395]
We develop an end-to-end system that supports multiple languages.
We exploit knowledge from a pre-trained multi-lingual natural language processing model.
arXiv Detail & Related papers (2021-09-28T04:43:11Z) - Multilingual and code-switching ASR challenges for low resource Indian
languages [59.2906853285309]
We focus on building multilingual and code-switching ASR systems through two different subtasks related to a total of seven Indian languages.
We provide a total of 600 hours of transcribed speech data, comprising train and test sets, in these languages.
We also provide a baseline recipe for both the tasks with a WER of 30.73% and 32.45% on the test sets of multilingual and code-switching subtasks, respectively.
arXiv Detail & Related papers (2021-04-01T03:37:01Z) - UPB at SemEval-2020 Task 12: Multilingual Offensive Language Detection
on Social Media by Fine-tuning a Variety of BERT-based Models [0.0]
This paper describes our Transformer-based solutions for identifying offensive language on Twitter in five languages.
It was employed in Subtask A of the Offenseval 2020 shared task.
arXiv Detail & Related papers (2020-10-26T14:28:29Z) - Looking for Clues of Language in Multilingual BERT to Improve
Cross-lingual Generalization [56.87201892585477]
Token embeddings in multilingual BERT (m-BERT) contain both language and semantic information.
We control the output languages of multilingual BERT by manipulating the token embeddings.
arXiv Detail & Related papers (2020-10-20T05:41:35Z) - BRUMS at SemEval-2020 Task 12 : Transformer based Multilingual Offensive
Language Identification in Social Media [9.710464466895521]
We present a multilingual deep learning model to identify offensive language in social media.
The approach achieves acceptable evaluation scores, while maintaining flexibility between languages.
arXiv Detail & Related papers (2020-10-13T10:39:14Z) - ANDES at SemEval-2020 Task 12: A jointly-trained BERT multilingual model
for offensive language detection [0.6445605125467572]
We jointly-trained a single model by fine-tuning Multilingual BERT to tackle the task across all the proposed languages.
Our single model had competitive results, with a performance close to top-performing systems.
arXiv Detail & Related papers (2020-08-13T16:07:00Z) - LT@Helsinki at SemEval-2020 Task 12: Multilingual or language-specific
BERT? [0.42056926734482064]
This paper presents the different models submitted by the LT@Heldirectional team for the SemEval 2020 Shared Task 12.
Our team participated in sub-tasks A and C; titled offensive language identification and offense target identification, respectively.
In both cases we used the so-called Bisinki Representation from Transformer (BERT), a model pre-trained by Google and fine-tuned by us on the OLID and SOLID datasets.
arXiv Detail & Related papers (2020-08-03T12:03:17Z) - SemEval-2020 Task 12: Multilingual Offensive Language Identification in
Social Media (OffensEval 2020) [33.66689662526814]
We present the results and main findings of SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2020)
OffensEval 2020 was one of the most popular tasks at SemEval-2020 attracting a large number of participants across all subtasks and also across all languages.
A total of 528 teams signed up to participate in the task, 145 teams submitted systems during the evaluation period, and 70 submitted system description papers.
arXiv Detail & Related papers (2020-06-12T14:39:40Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z) - A Study of Cross-Lingual Ability and Language-specific Information in
Multilingual BERT [60.9051207862378]
multilingual BERT works remarkably well on cross-lingual transfer tasks.
Datasize and context window size are crucial factors to the transferability.
There is a computationally cheap but effective approach to improve the cross-lingual ability of multilingual BERT.
arXiv Detail & Related papers (2020-04-20T11:13:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.