PUM at SemEval-2020 Task 12: Aggregation of Transformer-based models'
features for offensive language recognition
- URL: http://arxiv.org/abs/2010.01897v1
- Date: Mon, 5 Oct 2020 10:25:29 GMT
- Title: PUM at SemEval-2020 Task 12: Aggregation of Transformer-based models'
features for offensive language recognition
- Authors: Piotr Janiszewski, Mateusz Skiba, Urszula Wali\'nska
- Abstract summary: Our team was ranked 7th out of 40 in Sub-task C - Offense target identification with 64.727% macro F1-score and 64th out of 85 in Sub-task A - Offensive language identification (89.726% F1-score)
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we describe the PUM team's entry to the SemEval-2020 Task 12.
Creating our solution involved leveraging two well-known pretrained models used
in natural language processing: BERT and XLNet, which achieve state-of-the-art
results in multiple NLP tasks. The models were fine-tuned for each subtask
separately and features taken from their hidden layers were combined and fed
into a fully connected neural network. The model using aggregated Transformer
features can serve as a powerful tool for offensive language identification
problem. Our team was ranked 7th out of 40 in Sub-task C - Offense target
identification with 64.727% macro F1-score and 64th out of 85 in Sub-task A -
Offensive language identification (89.726% F1-score).
Related papers
- Team QUST at SemEval-2024 Task 8: A Comprehensive Study of Monolingual
and Multilingual Approaches for Detecting AI-generated Text [0.1499944454332829]
This paper presents the participation of team QUST in Task 8 SemEval 2024.
We first performed data augmentation and cleaning on the dataset to enhance model training efficiency and accuracy.
In the monolingual task, we evaluated traditional deep-learning methods, multiscale positive-unlabeled framework (MPU), fine-tuning, adapters and ensemble methods.
arXiv Detail & Related papers (2024-02-19T08:22:51Z) - SwinFace: A Multi-task Transformer for Face Recognition, Expression
Recognition, Age Estimation and Attribute Estimation [60.94239810407917]
This paper presents a multi-purpose algorithm for simultaneous face recognition, facial expression recognition, age estimation, and face attribute estimation based on a single Swin Transformer.
To address the conflicts among multiple tasks, a Multi-Level Channel Attention (MLCA) module is integrated into each task-specific analysis.
Experiments show that the proposed model has a better understanding of the face and achieves excellent performance for all tasks.
arXiv Detail & Related papers (2023-08-22T15:38:39Z) - From English to More Languages: Parameter-Efficient Model Reprogramming
for Cross-Lingual Speech Recognition [50.93943755401025]
We propose a new parameter-efficient learning framework based on neural model reprogramming for cross-lingual speech recognition.
We design different auxiliary neural architectures focusing on learnable pre-trained feature enhancement.
Our methods outperform existing ASR tuning architectures and their extension with self-supervised losses.
arXiv Detail & Related papers (2023-01-19T02:37:56Z) - GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator [114.8954615026781]
We propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator.
GanLM is trained with two pre-training objectives: replaced token detection and replaced token denoising.
Experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models.
arXiv Detail & Related papers (2022-12-20T12:51:11Z) - Toward Efficient Language Model Pretraining and Downstream Adaptation
via Self-Evolution: A Case Study on SuperGLUE [203.65227947509933]
This report describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard.
SuperGLUE is more challenging than the widely used general language understanding evaluation (GLUE) benchmark, containing eight difficult language understanding tasks.
arXiv Detail & Related papers (2022-12-04T15:36:18Z) - UPB at SemEval-2021 Task 7: Adversarial Multi-Task Learning for
Detecting and Rating Humor and Offense [0.6404122934568858]
We describe our adversarial multi-task network, AMTL-Humor, used to detect and rate humor and offensive texts.
Our best model consists of an ensemble of all tested configurations, and achieves a 95.66% F1-score and 94.70% accuracy for Task 1a.
arXiv Detail & Related papers (2021-04-13T09:59:05Z) - An Attention Ensemble Approach for Efficient Text Classification of
Indian Languages [0.0]
This paper focuses on the coarse-grained technical domain identification of short text documents in Marathi, a Devanagari script-based Indian language.
A hybrid CNN-BiLSTM attention ensemble model is proposed that competently combines the intermediate sentence representations generated by the convolutional neural network and the bidirectional long short-term memory, leading to efficient text classification.
Experimental results show that the proposed model outperforms various baseline machine learning and deep learning models in the given task, giving the best validation accuracy of 89.57% and f1-score of 0.8875.
arXiv Detail & Related papers (2021-02-20T07:31:38Z) - Ghmerti at SemEval-2019 Task 6: A Deep Word- and Character-based
Approach to Offensive Language Identification [1.192436948211501]
OffensEval addresses the problem of identifying and categorizing offensive language in social media.
The proposed approach includes character-level Convolutional Neural Network, word-level Recurrent Neural Network, and some preprocessing.
The performance achieved by the proposed model for subtask A is 77.93% macro-averaged F1-score.
arXiv Detail & Related papers (2020-09-22T20:13:48Z) - GUIR at SemEval-2020 Task 12: Domain-Tuned Contextualized Models for
Offensive Language Detection [27.45642971636561]
OffensEval 2020 task includes three English sub-tasks: identifying the presence of offensive language (Sub-task A), identifying the presence of target in offensive language (Sub-task B), and identifying the categories of the target (Sub-task C)
Our submissions achieve F1 scores of 91.7% in Sub-task A, 66.5% in Sub-task B, and 63.2% in Sub-task C.
arXiv Detail & Related papers (2020-07-28T20:45:43Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z) - Rnn-transducer with language bias for end-to-end Mandarin-English
code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem.
We use the language identities to bias the model to predict the CS points.
This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.