Automatic Sexism Detection with Multilingual Transformer Models
- URL: http://arxiv.org/abs/2106.04908v1
- Date: Wed, 9 Jun 2021 08:45:51 GMT
- Title: Automatic Sexism Detection with Multilingual Transformer Models
- Authors: Sch\"utz Mina, Boeck Jaqueline, Liakhovets Daria, Slijep\v{c}evi\'c
Djordje, Kirchknopf Armin, Hecht Manuel, Bogensperger Johannes, Schlarb Sven,
Schindler Alexander, Zeppelzauer Matthias
- Abstract summary: This paper presents the contribution of the AIT_FHSTP team at the EXIST 2021 benchmark for two sEXism Identification in Social neTworks tasks.
To solve the tasks we applied two multilingual transformer models, one based on multilingual BERT and one based on XLM-R.
Our approach uses two different strategies to adapt the transformers to the detection of sexist content: first, unsupervised pre-training with additional data and second, supervised fine-tuning with additional and augmented data.
For both tasks our best model is XLM-R with unsupervised pre-training on the EXIST data and additional datasets
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sexism has become an increasingly major problem on social networks during the
last years. The first shared task on sEXism Identification in Social neTworks
(EXIST) at IberLEF 2021 is an international competition in the field of Natural
Language Processing (NLP) with the aim to automatically identify sexism in
social media content by applying machine learning methods. Thereby sexism
detection is formulated as a coarse (binary) classification problem and a
fine-grained classification task that distinguishes multiple types of sexist
content (e.g., dominance, stereotyping, and objectification). This paper
presents the contribution of the AIT_FHSTP team at the EXIST2021 benchmark for
both tasks. To solve the tasks we applied two multilingual transformer models,
one based on multilingual BERT and one based on XLM-R. Our approach uses two
different strategies to adapt the transformers to the detection of sexist
content: first, unsupervised pre-training with additional data and second,
supervised fine-tuning with additional and augmented data. For both tasks our
best model is XLM-R with unsupervised pre-training on the EXIST data and
additional datasets and fine-tuning on the provided dataset. The best run for
the binary classification (task 1) achieves a macro F1-score of 0.7752 and
scores 5th rank in the benchmark; for the multiclass classification (task 2)
our best submission scores 6th rank with a macro F1-score of 0.5589.
Related papers
- GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - LCT-1 at SemEval-2023 Task 10: Pre-training and Multi-task Learning for
Sexism Detection and Classification [0.0]
SemEval-2023 Task 10 on Explainable Detection of Online Sexism aims at increasing explainability of the sexism detection.
Our system is based on further domain-adaptive pre-training.
In experiments, multi-task learning performs on par with standard fine-tuning for sexism detection.
arXiv Detail & Related papers (2023-06-08T09:56:57Z) - HausaNLP at SemEval-2023 Task 10: Transfer Learning, Synthetic Data and
Side-Information for Multi-Level Sexism Classification [0.007696728525672149]
We present the findings of our participation in the SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS) task.
We investigated the effects of transferring two language models: XLM-T (sentiment classification) and HateBERT (same domain -- Reddit) for multi-level classification into Sexist or not Sexist.
arXiv Detail & Related papers (2023-04-28T20:03:46Z) - Attention at SemEval-2023 Task 10: Explainable Detection of Online
Sexism (EDOS) [15.52876591707497]
We have worked on interpretability, trust, and understanding of the decisions made by models in the form of classification tasks.
The first task consists of determining Binary Sexism Detection.
The second task describes the Category of Sexism.
The third task describes a more Fine-grained Category of Sexism.
arXiv Detail & Related papers (2023-04-10T14:24:52Z) - Change is Hard: A Closer Look at Subpopulation Shift [48.0369745740936]
We propose a unified framework that dissects and explains common shifts in subgroups.
We then establish a benchmark of 20 state-of-the-art algorithms evaluated on 12 real-world datasets in vision, language, and healthcare domains.
arXiv Detail & Related papers (2023-02-23T18:59:56Z) - Rethinking the Two-Stage Framework for Grounded Situation Recognition [61.93345308377144]
Grounded Situation Recognition is an essential step towards "human-like" event understanding.
Existing GSR methods resort to a two-stage framework: predicting the verb in the first stage and detecting the semantic roles in the second stage.
We propose a novel SituFormer for GSR which consists of a Coarse-to-Fine Verb Model (CFVM) and a Transformer-based Noun Model (TNM)
arXiv Detail & Related papers (2021-12-10T08:10:56Z) - Sexism Prediction in Spanish and English Tweets Using Monolingual and
Multilingual BERT and Ensemble Models [0.0]
This work proposes a system to use multilingual and monolingual BERT and data points translation and ensemble strategies for sexism identification and classification in English and Spanish.
arXiv Detail & Related papers (2021-11-08T15:01:06Z) - X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented
Compositional Semantic Parsing [51.81533991497547]
Task-oriented compositional semantic parsing (TCSP) handles complex nested user queries.
We present X2 compared a transferable Cross-lingual and Cross-domain for TCSP.
We propose to predict flattened intents and slots representations separately and cast both prediction tasks into sequence labeling problems.
arXiv Detail & Related papers (2021-06-07T16:40:05Z) - Phonemer at WNUT-2020 Task 2: Sequence Classification Using COVID
Twitter BERT and Bagging Ensemble Technique based on Plurality Voting [0.0]
We develop a system that automatically identifies whether an English Tweet related to the novel coronavirus (COVID-19) is informative or not.
Our final approach achieved an F1-score of 0.9037 and we were ranked sixth overall with F1-score as the evaluation criteria.
arXiv Detail & Related papers (2020-10-01T10:54:54Z) - Revisiting LSTM Networks for Semi-Supervised Text Classification via
Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results.
We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z) - Deep F-measure Maximization for End-to-End Speech Understanding [52.36496114728355]
We propose a differentiable approximation to the F-measure and train the network with this objective using standard backpropagation.
We perform experiments on two standard fairness datasets, Adult, Communities and Crime, and also on speech-to-intent detection on the ATIS dataset and speech-to-image concept classification on the Speech-COCO dataset.
In all four of these tasks, F-measure results in improved micro-F1 scores, with absolute improvements of up to 8% absolute, as compared to models trained with the cross-entropy loss function.
arXiv Detail & Related papers (2020-08-08T03:02:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.