Sexism Identification in Tweets and Gabs using Deep Neural Networks
- URL: http://arxiv.org/abs/2111.03612v1
- Date: Fri, 5 Nov 2021 16:57:08 GMT
- Title: Sexism Identification in Tweets and Gabs using Deep Neural Networks
- Authors: Amikul Kalra, Arkaitz Zubiaga
- Abstract summary: This paper explores the classification of sexism in text using a variety of deep neural network model architectures.
It performs binary and multiclass sexism classification on the dataset of tweets and gabs from the sEXism Identification in Social neTworks (EXIST) task in IberLEF 2021.
The models are seen to perform comparatively to those from the competition, with the best performances seen using BERT and a multi-filter CNN model.
- Score: 6.531659195805749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Through anonymisation and accessibility, social media platforms have
facilitated the proliferation of hate speech, prompting increased research in
developing automatic methods to identify these texts. This paper explores the
classification of sexism in text using a variety of deep neural network model
architectures such as Long-Short-Term Memory (LSTMs) and Convolutional Neural
Networks (CNNs). These networks are used in conjunction with transfer learning
in the form of Bidirectional Encoder Representations from Transformers (BERT)
and DistilBERT models, along with data augmentation, to perform binary and
multiclass sexism classification on the dataset of tweets and gabs from the
sEXism Identification in Social neTworks (EXIST) task in IberLEF 2021. The
models are seen to perform comparatively to those from the competition, with
the best performances seen using BERT and a multi-filter CNN model. Data
augmentation further improves these results for the multi-class classification
task. This paper also explores the errors made by the models and discusses the
difficulty in automatically classifying sexism due to the subjectivity of the
labels and the complexity of natural language used in social media.
Related papers
- Convolutional Neural Networks for Sentiment Analysis on Weibo Data: A
Natural Language Processing Approach [0.228438857884398]
This study addresses the complex task of sentiment analysis on a dataset of 119,988 original tweets from Weibo using a Convolutional Neural Network (CNN)
A CNN-based model was utilized, leveraging word embeddings for feature extraction, and trained to perform sentiment classification.
The model achieved a macro-average F1-score of approximately 0.73 on the test set, showing balanced performance across positive, neutral, and negative sentiments.
arXiv Detail & Related papers (2023-07-13T03:02:56Z) - Initial Study into Application of Feature Density and
Linguistically-backed Embedding to Improve Machine Learning-based
Cyberbullying Detection [54.83707803301847]
The research was conducted on a Formspring dataset provided in a Kaggle competition on automatic cyberbullying detection.
The study confirmed the effectiveness of Neural Networks in cyberbullying detection and the correlation between classifier performance and Feature Density.
arXiv Detail & Related papers (2022-06-04T03:17:15Z) - Identifying and Categorizing Offensive Language in Social Media [0.0]
This study provides a description of a classification system built for SemEval 2019 Task 6: OffensEval.
We trained machine learning and deep learning models along with data preprocessing and sampling techniques to come up with the best results.
arXiv Detail & Related papers (2021-04-10T22:53:43Z) - On the benefits of robust models in modulation recognition [53.391095789289736]
Deep Neural Networks (DNNs) using convolutional layers are state-of-the-art in many tasks in communications.
In other domains, like image classification, DNNs have been shown to be vulnerable to adversarial perturbations.
We propose a novel framework to test the robustness of current state-of-the-art models.
arXiv Detail & Related papers (2021-03-27T19:58:06Z) - Leveraging Multi-domain, Heterogeneous Data using Deep Multitask
Learning for Hate Speech Detection [21.410160004193916]
We propose a Convolution Neural Network based multi-task learning models (MTLs)footnotecode to leverage information from multiple sources.
Empirical analysis performed on three benchmark datasets shows the efficacy of the proposed approach.
arXiv Detail & Related papers (2021-03-23T09:31:01Z) - Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers.
We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks.
The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z) - Incremental Embedding Learning via Zero-Shot Translation [65.94349068508863]
Current state-of-the-art incremental learning methods tackle catastrophic forgetting problem in traditional classification networks.
We propose a novel class-incremental method for embedding network, named as zero-shot translation class-incremental method (ZSTCI)
In addition, ZSTCI can easily be combined with existing regularization-based incremental learning methods to further improve performance of embedding networks.
arXiv Detail & Related papers (2020-12-31T08:21:37Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - PIN: A Novel Parallel Interactive Network for Spoken Language
Understanding [68.53121591998483]
In the existing RNN-based approaches, ID and SF tasks are often jointly modeled to utilize the correlation information between them.
The experiments on two benchmark datasets, i.e., SNIPS and ATIS, demonstrate the effectiveness of our approach.
More encouragingly, by using the feature embedding of the utterance generated by the pre-trained language model BERT, our method achieves the state-of-the-art among all comparison approaches.
arXiv Detail & Related papers (2020-09-28T15:59:31Z) - Identity-Based Patterns in Deep Convolutional Networks: Generative
Adversarial Phonology and Reduplication [0.0]
We use the ciwGAN architecture Beguvs in which learning of meaningful representations in speech emerges from a requirement that the CNNs generate informative data.
We propose a technique to wug-test CNNs trained on speech and, based on four generative tests, argue that the network learns to represent an identity-based pattern in its latent space.
arXiv Detail & Related papers (2020-09-13T23:12:49Z) - Stochastic encoding of graphs in deep learning allows for complex
analysis of gender classification in resting-state and task functional brain
networks from the UK Biobank [0.13706331473063876]
We introduce a encoding method in an ensemble of CNNs to classify functional connectomes by gender.
We measure the salience of three brain networks involved in task- and resting-states, and their interaction.
arXiv Detail & Related papers (2020-02-25T15:10:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.