Gender prediction using limited Twitter Data
- URL: http://arxiv.org/abs/2010.02005v1
- Date: Tue, 29 Sep 2020 11:46:07 GMT
- Title: Gender prediction using limited Twitter Data
- Authors: Maaike Burghoorn and Maaike H.T. de Boer and Stephan Raaijmakers
- Abstract summary: This paper explores the usability of BERT (a Transformer model for word embedding) for gender prediction on social media.
A Dutch BERT model is fine-tuned on different samples of a Dutch Twitter dataset labeled for gender, varying in the number of tweets used per person.
Results show that even with relatively small amounts of data, BERT can be fine-tuned to accurately help predict the gender of Twitter users.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer models have shown impressive performance on a variety of NLP
tasks. Off-the-shelf, pre-trained models can be fine-tuned for specific NLP
classification tasks, reducing the need for large amounts of additional
training data. However, little research has addressed how much data is required
to accurately fine-tune such pre-trained transformer models, and how much data
is needed for accurate prediction. This paper explores the usability of BERT (a
Transformer model for word embedding) for gender prediction on social media.
Forensic applications include detecting gender obfuscation, e.g. males posing
as females in chat rooms. A Dutch BERT model is fine-tuned on different samples
of a Dutch Twitter dataset labeled for gender, varying in the number of tweets
used per person. The results show that finetuning BERT contributes to good
gender classification performance (80% F1) when finetuned on only 200 tweets
per person. But when using just 20 tweets per person, the performance of our
classifier deteriorates non-steeply (to 70% F1). These results show that even
with relatively small amounts of data, BERT can be fine-tuned to accurately
help predict the gender of Twitter users, and, consequently, that it is
possible to determine gender on the basis of just a low volume of tweets. This
opens up an operational perspective on the swift detection of gender.
Related papers
- Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender
Perturbation over Fairytale Texts [87.62403265382734]
Recent studies show that traditional fairytales are rife with harmful gender biases.
This work aims to assess learned biases of language models by evaluating their robustness against gender perturbations.
arXiv Detail & Related papers (2023-10-16T22:25:09Z) - The Impact of Debiasing on the Performance of Language Models in
Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets.
Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z) - The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender
Characterisation in 55 Languages [51.2321117760104]
This paper describes the Gender-GAP Pipeline, an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages.
The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text.
We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation.
arXiv Detail & Related papers (2023-08-31T17:20:50Z) - Exploring Gender Bias in Retrieval Models [2.594412743115663]
Mitigating gender bias in information retrieval is important to avoid propagating stereotypes.
We employ a dataset consisting of two components: (1) relevance of a document to a query and (2) "gender" of a document.
We show that pre-trained models for IR do not perform well in zero-shot retrieval tasks when full fine-tuning of a large pre-trained BERT encoder is performed.
We also illustrate that pre-trained models have gender biases that result in retrieved articles tending to be more often male than female.
arXiv Detail & Related papers (2022-08-02T21:12:05Z) - Twitter-Based Gender Recognition Using Transformers [2.539920413471809]
We propose a model based on transformers to predict the user's gender from their images and tweets.
We fine-tune another model based on Bidirectional Representations from Transformers (ViTBERT) to recognize the user's gender by their tweets.
The combination model improves the accuracy of image and text classification models by 6.98% and 4.43%, respectively.
arXiv Detail & Related papers (2022-04-24T19:58:42Z) - Improving Gender Fairness of Pre-Trained Language Models without
Catastrophic Forgetting [88.83117372793737]
Forgetting information in the original training data may damage the model's downstream performance by a large margin.
We propose GEnder Equality Prompt (GEEP) to improve gender fairness of pre-trained models with less forgetting.
arXiv Detail & Related papers (2021-10-11T15:52:16Z) - Investigating Gender Bias in BERT [22.066477991442003]
We analyse the gender-bias it induces in five downstream tasks related to emotion and sentiment intensity prediction.
We propose an algorithm that finds fine-grained gender directions, i.e., one primary direction for each BERT layer.
Experiments show that removing embedding components in such directions achieves great success in reducing BERT-induced bias in the downstream tasks.
arXiv Detail & Related papers (2020-09-10T17:38:32Z) - Mitigating Gender Bias in Captioning Systems [56.25457065032423]
Most captioning models learn gender bias, leading to high gender prediction errors, especially for women.
We propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence.
arXiv Detail & Related papers (2020-06-15T12:16:19Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.