KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech
Identification in Social Media
- URL: http://arxiv.org/abs/2007.13184v1
- Date: Sun, 26 Jul 2020 17:26:20 GMT
- Title: KUISAIL at SemEval-2020 Task 12: BERT-CNN for Offensive Speech
Identification in Social Media
- Authors: Ali Safaya, Moutasem Abdullatif, Deniz Yuret
- Abstract summary: We show that combining CNN with BERT is better than using BERT on its own.
We present ArabicBERT, a set of pre-trained transformer language models for Arabic.
- Score: 0.2148535041822524
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we describe our approach to utilize pre-trained BERT models
with Convolutional Neural Networks for sub-task A of the Multilingual Offensive
Language Identification shared task (OffensEval 2020), which is a part of the
SemEval 2020. We show that combining CNN with BERT is better than using BERT on
its own, and we emphasize the importance of utilizing pre-trained language
models for downstream tasks. Our system, ranked 4th with macro averaged
F1-Score of 0.897 in Arabic, 4th with score of 0.843 in Greek, and 3rd with
score of 0.814 in Turkish. Additionally, we present ArabicBERT, a set of
pre-trained transformer language models for Arabic that we share with the
community.
Related papers
- E-TSL: A Continuous Educational Turkish Sign Language Dataset with Baseline Methods [2.0257616108612373]
This study introduces the continuous Educational Turkish Sign Language dataset, collected from online Turkish language lessons for 5th, 6th, and 8th grades.
The dataset comprises 1,410 videos totaling nearly 24 hours and includes performances from 11 signers.
Turkish, an agglutinative language, poses unique challenges for sign language translation, particularly with a vocabulary where 64% are singleton words and 85% are rare words, appearing less than five times.
arXiv Detail & Related papers (2024-05-05T16:07:23Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - BJTU-WeChat's Systems for the WMT22 Chat Translation Task [66.81525961469494]
This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German.
Based on the Transformer, we apply several effective variants.
Our systems achieve 0.810 and 0.946 COMET scores.
arXiv Detail & Related papers (2022-11-28T02:35:04Z) - Tencent AI Lab - Shanghai Jiao Tong University Low-Resource Translation
System for the WMT22 Translation Task [49.916963624249355]
This paper describes Tencent AI Lab - Shanghai Jiao Tong University (TAL-SJTU) Low-Resource Translation systems for the WMT22 shared task.
We participate in the general translation task on English$Leftrightarrow$Livonian.
Our system is based on M2M100 with novel techniques that adapt it to the target language pair.
arXiv Detail & Related papers (2022-10-17T04:34:09Z) - UPB at SemEval-2020 Task 12: Multilingual Offensive Language Detection
on Social Media by Fine-tuning a Variety of BERT-based Models [0.0]
This paper describes our Transformer-based solutions for identifying offensive language on Twitter in five languages.
It was employed in Subtask A of the Offenseval 2020 shared task.
arXiv Detail & Related papers (2020-10-26T14:28:29Z) - PUM at SemEval-2020 Task 12: Aggregation of Transformer-based models'
features for offensive language recognition [0.0]
Our team was ranked 7th out of 40 in Sub-task C - Offense target identification with 64.727% macro F1-score and 64th out of 85 in Sub-task A - Offensive language identification (89.726% F1-score)
arXiv Detail & Related papers (2020-10-05T10:25:29Z) - ANDES at SemEval-2020 Task 12: A jointly-trained BERT multilingual model
for offensive language detection [0.6445605125467572]
We jointly-trained a single model by fine-tuning Multilingual BERT to tackle the task across all the proposed languages.
Our single model had competitive results, with a performance close to top-performing systems.
arXiv Detail & Related papers (2020-08-13T16:07:00Z) - ConvBERT: Improving BERT with Span-based Dynamic Convolution [144.25748617961082]
BERT heavily relies on the global self-attention block and thus suffers large memory footprint and computation cost.
We propose a novel span-based dynamic convolution to replace these self-attention heads to directly model local dependencies.
The novel convolution heads, together with the rest self-attention heads, form a new mixed attention block that is more efficient at both global and local context learning.
arXiv Detail & Related papers (2020-08-06T07:43:19Z) - DeBERTa: Decoding-enhanced BERT with Disentangled Attention [119.77305080520718]
We propose a new model architecture DeBERTa that improves the BERT and RoBERTa models using two novel techniques.
We show that these techniques significantly improve the efficiency of model pre-training and the performance of both natural language understanding (NLU) and natural langauge generation (NLG) downstream tasks.
arXiv Detail & Related papers (2020-06-05T19:54:34Z) - LIIR at SemEval-2020 Task 12: A Cross-Lingual Augmentation Approach for
Multilingual Offensive Language Identification [19.23116755449024]
We adapt and fine-tune the BERT and Multilingual Bert models made available by Google AI for English and non-English languages respectively.
For the English language, we use a combination of two fine-tuned BERT models.
For other languages we propose a cross-lingual augmentation approach in order to enrich training data and we use Multilingual BERT to obtain sentence representations.
arXiv Detail & Related papers (2020-05-07T18:45:48Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.