AI-UPV at EXIST 2023 -- Sexism Characterization Using Large Language
Models Under The Learning with Disagreements Regime
- URL: http://arxiv.org/abs/2307.03385v1
- Date: Fri, 7 Jul 2023 04:49:26 GMT
- Title: AI-UPV at EXIST 2023 -- Sexism Characterization Using Large Language
Models Under The Learning with Disagreements Regime
- Authors: Angel Felipe Magnoss\~ao de Paula, Giulia Rizzi, Elisabetta Fersini,
Damiano Spina
- Abstract summary: This paper describes AI-UPV team's participation in the EXIST (sEXism Identification in Social neTworks) Lab at CLEF 2023.
The proposed approach aims at addressing the task of sexism identification and characterization under the learning with disagreements paradigm.
The proposed system uses large language models (i.e., mBERT and XLM-RoBERTa) and ensemble strategies for sexism identification and classification in English and Spanish.
- Score: 2.4261434441245897
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the increasing influence of social media platforms, it has become
crucial to develop automated systems capable of detecting instances of sexism
and other disrespectful and hateful behaviors to promote a more inclusive and
respectful online environment. Nevertheless, these tasks are considerably
challenging considering different hate categories and the author's intentions,
especially under the learning with disagreements regime. This paper describes
AI-UPV team's participation in the EXIST (sEXism Identification in Social
neTworks) Lab at CLEF 2023. The proposed approach aims at addressing the task
of sexism identification and characterization under the learning with
disagreements paradigm by training directly from the data with disagreements,
without using any aggregated label. Yet, performances considering both soft and
hard evaluations are reported. The proposed system uses large language models
(i.e., mBERT and XLM-RoBERTa) and ensemble strategies for sexism identification
and classification in English and Spanish. In particular, our system is
articulated in three different pipelines. The ensemble approach outperformed
the individual large language models obtaining the best performances both
adopting a soft and a hard label evaluation. This work describes the
participation in all the three EXIST tasks, considering a soft evaluation, it
obtained fourth place in Task 2 at EXIST and first place in Task 3, with the
highest ICM-Soft of -2.32 and a normalized ICM-Soft of 0.79. The source code of
our approaches is publicly available at
https://github.com/AngelFelipeMP/Sexism-LLM-Learning-With-Disagreement.
Related papers
- PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis [74.41260927676747]
This paper bridges the gaps by introducing a multimodal conversational Sentiment Analysis (ABSA)
To benchmark the tasks, we construct PanoSent, a dataset annotated both manually and automatically, featuring high quality, large scale, multimodality, multilingualism, multi-scenarios, and covering both implicit and explicit sentiment elements.
To effectively address the tasks, we devise a novel Chain-of-Sentiment reasoning framework, together with a novel multimodal large language model (namely Sentica) and a paraphrase-based verification mechanism.
arXiv Detail & Related papers (2024-08-18T13:51:01Z) - Bilingual Sexism Classification: Fine-Tuned XLM-RoBERTa and GPT-3.5 Few-Shot Learning [0.8192907805418581]
This study aims to improve sexism identification in bilingual contexts (English and Spanish) by leveraging natural language processing models.
We fine-tuned the XLM-RoBERTa model and separately used GPT-3.5 with few-shot learning prompts to classify sexist content.
arXiv Detail & Related papers (2024-06-11T14:15:33Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - HausaNLP at SemEval-2023 Task 10: Transfer Learning, Synthetic Data and
Side-Information for Multi-Level Sexism Classification [0.007696728525672149]
We present the findings of our participation in the SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS) task.
We investigated the effects of transferring two language models: XLM-T (sentiment classification) and HateBERT (same domain -- Reddit) for multi-level classification into Sexist or not Sexist.
arXiv Detail & Related papers (2023-04-28T20:03:46Z) - Stable Bias: Analyzing Societal Representations in Diffusion Models [72.27121528451528]
We propose a new method for exploring the social biases in Text-to-Image (TTI) systems.
Our approach relies on characterizing the variation in generated images triggered by enumerating gender and ethnicity markers in the prompts.
We leverage this method to analyze images generated by 3 popular TTI systems and find that while all of their outputs show correlations with US labor demographics, they also consistently under-represent marginalized identities to different extents.
arXiv Detail & Related papers (2023-03-20T19:32:49Z) - Bag of Tricks for Effective Language Model Pretraining and Downstream
Adaptation: A Case Study on GLUE [93.98660272309974]
This report briefly describes our submission Vega v1 on the General Language Understanding Evaluation leaderboard.
GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference.
With our optimized pretraining and fine-tuning strategies, our 1.3 billion model sets new state-of-the-art on 4/9 tasks, achieving the best average score of 91.3.
arXiv Detail & Related papers (2023-02-18T09:26:35Z) - MaPLe: Multi-modal Prompt Learning [54.96069171726668]
We propose Multi-modal Prompt Learning (MaPLe) for both vision and language branches to improve alignment between the vision and language representations.
Compared with the state-of-the-art method Co-CoOp, MaPLe exhibits favorable performance and achieves an absolute gain of 3.45% on novel classes.
arXiv Detail & Related papers (2022-10-06T17:59:56Z) - Deep Multi-Task Models for Misogyny Identification and Categorization on
Arabic Social Media [6.6410040715586005]
In this paper, we present the submitted systems to the first Arabic Misogyny Identification shared task.
We investigate three multi-task learning models as well as their single-task counterparts.
In order to encode the input text, our models rely on the pre-trained MARBERT language model.
arXiv Detail & Related papers (2022-06-16T18:54:37Z) - Sexism Prediction in Spanish and English Tweets Using Monolingual and
Multilingual BERT and Ensemble Models [0.0]
This work proposes a system to use multilingual and monolingual BERT and data points translation and ensemble strategies for sexism identification and classification in English and Spanish.
arXiv Detail & Related papers (2021-11-08T15:01:06Z) - Automatic Sexism Detection with Multilingual Transformer Models [0.0]
This paper presents the contribution of the AIT_FHSTP team at the EXIST 2021 benchmark for two sEXism Identification in Social neTworks tasks.
To solve the tasks we applied two multilingual transformer models, one based on multilingual BERT and one based on XLM-R.
Our approach uses two different strategies to adapt the transformers to the detection of sexist content: first, unsupervised pre-training with additional data and second, supervised fine-tuning with additional and augmented data.
For both tasks our best model is XLM-R with unsupervised pre-training on the EXIST data and additional datasets
arXiv Detail & Related papers (2021-06-09T08:45:51Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.