Sexism Prediction in Spanish and English Tweets Using Monolingual and
Multilingual BERT and Ensemble Models
- URL: http://arxiv.org/abs/2111.04551v1
- Date: Mon, 8 Nov 2021 15:01:06 GMT
- Title: Sexism Prediction in Spanish and English Tweets Using Monolingual and
Multilingual BERT and Ensemble Models
- Authors: Angel Felipe Magnoss\~ao de Paula and Roberto Fray da Silva and Ipek
Baris Schlicht
- Abstract summary: This work proposes a system to use multilingual and monolingual BERT and data points translation and ensemble strategies for sexism identification and classification in English and Spanish.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The popularity of social media has created problems such as hate speech and
sexism. The identification and classification of sexism in social media are
very relevant tasks, as they would allow building a healthier social
environment. Nevertheless, these tasks are considerably challenging. This work
proposes a system to use multilingual and monolingual BERT and data points
translation and ensemble strategies for sexism identification and
classification in English and Spanish. It was conducted in the context of the
sEXism Identification in Social neTworks shared 2021 (EXIST 2021) task,
proposed by the Iberian Languages Evaluation Forum (IberLEF). The proposed
system and its main components are described, and an in-depth hyperparameters
analysis is conducted. The main results observed were: (i) the system obtained
better results than the baseline model (multilingual BERT); (ii) ensemble
models obtained better results than monolingual models; and (iii) an ensemble
model considering all individual models and the best standardized values
obtained the best accuracies and F1-scores for both tasks. This work obtained
first place in both tasks at EXIST, with the highest accuracies (0.780 for task
1 and 0.658 for task 2) and F1-scores (F1-binary of 0.780 for task 1 and
F1-macro of 0.579 for task 2).
Related papers
- Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - AI-UPV at EXIST 2023 -- Sexism Characterization Using Large Language
Models Under The Learning with Disagreements Regime [2.4261434441245897]
This paper describes AI-UPV team's participation in the EXIST (sEXism Identification in Social neTworks) Lab at CLEF 2023.
The proposed approach aims at addressing the task of sexism identification and characterization under the learning with disagreements paradigm.
The proposed system uses large language models (i.e., mBERT and XLM-RoBERTa) and ensemble strategies for sexism identification and classification in English and Spanish.
arXiv Detail & Related papers (2023-07-07T04:49:26Z) - HausaNLP at SemEval-2023 Task 10: Transfer Learning, Synthetic Data and
Side-Information for Multi-Level Sexism Classification [0.007696728525672149]
We present the findings of our participation in the SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS) task.
We investigated the effects of transferring two language models: XLM-T (sentiment classification) and HateBERT (same domain -- Reddit) for multi-level classification into Sexist or not Sexist.
arXiv Detail & Related papers (2023-04-28T20:03:46Z) - OneAligner: Zero-shot Cross-lingual Transfer with One Rich-Resource
Language Pair for Low-Resource Sentence Retrieval [91.76575626229824]
We present OneAligner, an alignment model specially designed for sentence retrieval tasks.
When trained with all language pairs of a large-scale parallel multilingual corpus (OPUS-100), this model achieves the state-of-the-art result.
We conclude through empirical results and analyses that the performance of the sentence alignment task depends mostly on the monolingual and parallel data size.
arXiv Detail & Related papers (2022-05-17T19:52:42Z) - AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in
Immigration-Related Web News Comments Using Transformers and Statistical
Models [0.0]
We implement an accurate model to detect xenophobia in comments about web news articles.
We obtained the 3rd place in Task 1 official ranking with the F1-score of 0.5996, and we achieved the 6th place in Task 2 official ranking with the CEM of 0.7142.
Our results suggest: (i) BERT models obtain better results than statistical models for toxicity detection in text comments; (ii) Monolingual BERT models have an advantage over multilingual BERT models in toxicity detection in text comments in their pre-trained language.
arXiv Detail & Related papers (2021-11-08T14:24:21Z) - Automatic Sexism Detection with Multilingual Transformer Models [0.0]
This paper presents the contribution of the AIT_FHSTP team at the EXIST 2021 benchmark for two sEXism Identification in Social neTworks tasks.
To solve the tasks we applied two multilingual transformer models, one based on multilingual BERT and one based on XLM-R.
Our approach uses two different strategies to adapt the transformers to the detection of sexist content: first, unsupervised pre-training with additional data and second, supervised fine-tuning with additional and augmented data.
For both tasks our best model is XLM-R with unsupervised pre-training on the EXIST data and additional datasets
arXiv Detail & Related papers (2021-06-09T08:45:51Z) - How True is GPT-2? An Empirical Analysis of Intersectional Occupational
Biases [50.591267188664666]
Downstream applications are at risk of inheriting biases contained in natural language models.
We analyze the occupational biases of a popular generative language model, GPT-2.
For a given job, GPT-2 reflects the societal skew of gender and ethnicity in the US, and in some cases, pulls the distribution towards gender parity.
arXiv Detail & Related papers (2021-02-08T11:10:27Z) - Mixed-Lingual Pre-training for Cross-lingual Summarization [54.4823498438831]
Cross-lingual Summarization aims at producing a summary in the target language for an article in the source language.
We propose a solution based on mixed-lingual pre-training that leverages both cross-lingual tasks like translation and monolingual tasks like masked language models.
Our model achieves an improvement of 2.82 (English to Chinese) and 1.15 (Chinese to English) ROUGE-1 scores over state-of-the-art results.
arXiv Detail & Related papers (2020-10-18T00:21:53Z) - NEMO: Frequentist Inference Approach to Constrained Linguistic Typology
Feature Prediction in SIGTYP 2020 Shared Task [83.43738174234053]
We employ frequentist inference to represent correlations between typological features and use this representation to train simple multi-class estimators that predict individual features.
Our best configuration achieved the micro-averaged accuracy score of 0.66 on 149 test languages.
arXiv Detail & Related papers (2020-10-12T19:25:43Z) - FiSSA at SemEval-2020 Task 9: Fine-tuned For Feelings [2.362412515574206]
In this paper, we present our approach for sentiment classification on Spanish-English code-mixed social media data.
We explore both monolingual and multilingual models with the standard fine-tuning method.
Although two-step fine-tuning improves sentiment classification performance over the base model, the large multilingual XLM-RoBERTa model achieves best weighted F1-score.
arXiv Detail & Related papers (2020-07-24T14:48:27Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.