XD at SemEval-2020 Task 12: Ensemble Approach to Offensive Language
Identification in Social Media Using Transformer Encoders
- URL: http://arxiv.org/abs/2007.10945v1
- Date: Tue, 21 Jul 2020 17:03:00 GMT
- Title: XD at SemEval-2020 Task 12: Ensemble Approach to Offensive Language
Identification in Social Media Using Transformer Encoders
- Authors: Xiangjue Dong and Jinho D. Choi
- Abstract summary: This paper presents six document classification models using the latest transformer encoders and a high-performing ensemble model for a task of offensive language identification in social media.
Our analysis shows that although the ensemble model significantly improves the accuracy on the development set, the improvement is not as evident on the test set.
- Score: 17.14709845342071
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents six document classification models using the latest
transformer encoders and a high-performing ensemble model for a task of
offensive language identification in social media. For the individual models,
deep transformer layers are applied to perform multi-head attentions. For the
ensemble model, the utterance representations taken from those individual
models are concatenated and fed into a linear decoder to make the final
decisions. Our ensemble model outperforms the individual models and shows up to
8.6% improvement over the individual models on the development set. On the test
set, it achieves macro-F1 of 90.9% and becomes one of the high performing
systems among 85 participants in the sub-task A of this shared task. Our
analysis shows that although the ensemble model significantly improves the
accuracy on the development set, the improvement is not as evident on the test
set.
Related papers
- FusionBench: A Comprehensive Benchmark of Deep Model Fusion [78.80920533793595]
Deep model fusion is a technique that unifies the predictions or parameters of several deep neural networks into a single model.
FusionBench is the first comprehensive benchmark dedicated to deep model fusion.
arXiv Detail & Related papers (2024-06-05T13:54:28Z) - Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion [54.33764537135906]
VideoQA Transformer models demonstrate competitive performance on standard benchmarks.
Do these models capture the rich multimodal structures and dynamics from video and text jointly?
Are they achieving high scores by exploiting biases and spurious features?
arXiv Detail & Related papers (2023-06-15T06:45:46Z) - Domain Adaptation of Transformer-Based Models using Unlabeled Data for
Relevance and Polarity Classification of German Customer Feedback [1.2999413717930817]
This work explores how efficient transformer-based models are when working with a German customer feedback dataset.
The experimental results show that transformer-based models can reach significant improvements compared to a fastText baseline.
arXiv Detail & Related papers (2022-12-12T08:32:28Z) - Model ensemble instead of prompt fusion: a sample-specific knowledge
transfer method for few-shot prompt tuning [85.55727213502402]
We focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks.
We propose Sample-specific Ensemble of Source Models (SESoM)
SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs.
arXiv Detail & Related papers (2022-10-23T01:33:16Z) - Composing Ensembles of Pre-trained Models via Iterative Consensus [95.10641301155232]
We propose a unified framework for composing ensembles of different pre-trained models.
We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization.
We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer.
arXiv Detail & Related papers (2022-10-20T18:46:31Z) - Multi-Source Transformer Architectures for Audiovisual Scene
Classification [14.160670979300628]
The systems we submitted for subtask 1B of the DCASE 2021 challenge, regarding audiovisual scene classification, are described in detail.
They are essentially multi-source transformers employing a combination of auditory and visual features to make predictions.
arXiv Detail & Related papers (2022-10-18T23:42:42Z) - A Study on Transformer Configuration and Training Objective [33.7272660870026]
We propose Bamboo, an idea of using deeper and narrower transformer configurations for masked autoencoder training.
On ImageNet, with such a simple change in configuration, re-designed model achieves 87.1% top-1 accuracy.
On language tasks, re-designed model outperforms BERT with default setting by 1.1 points on average.
arXiv Detail & Related papers (2022-05-21T05:17:11Z) - CAMERO: Consistency Regularized Ensemble of Perturbed Language Models
with Weight Sharing [83.63107444454938]
We propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO.
Specifically, we share the weights of bottom layers across all models and apply different perturbations to the hidden representations for different models, which can effectively promote the model diversity.
Our experiments using large language models demonstrate that CAMERO significantly improves the generalization performance of the ensemble model.
arXiv Detail & Related papers (2022-04-13T19:54:51Z) - Model soups: averaging weights of multiple fine-tuned models improves
accuracy without increasing inference time [69.7693300927423]
We show that averaging the weights of multiple models fine-tuned with different hyper parameter configurations improves accuracy and robustness.
We show that the model soup approach extends to multiple image classification and natural language processing tasks.
arXiv Detail & Related papers (2022-03-10T17:03:49Z) - FiSSA at SemEval-2020 Task 9: Fine-tuned For Feelings [2.362412515574206]
In this paper, we present our approach for sentiment classification on Spanish-English code-mixed social media data.
We explore both monolingual and multilingual models with the standard fine-tuning method.
Although two-step fine-tuning improves sentiment classification performance over the base model, the large multilingual XLM-RoBERTa model achieves best weighted F1-score.
arXiv Detail & Related papers (2020-07-24T14:48:27Z) - Gestalt: a Stacking Ensemble for SQuAD2.0 [0.0]
We propose a deep-learning system that finds, or indicates the lack of, a correct answer to a question in a context paragraph.
Our goal is to learn an ensemble of heterogeneous SQuAD2.0 models that outperforms the best model in the ensemble per se.
arXiv Detail & Related papers (2020-04-02T08:09:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.