Related papers: Gender prediction using limited Twitter Data

Gender prediction using limited Twitter Data

URL: http://arxiv.org/abs/2010.02005v1
Date: Tue, 29 Sep 2020 11:46:07 GMT
Title: Gender prediction using limited Twitter Data
Authors: Maaike Burghoorn and Maaike H.T. de Boer and Stephan Raaijmakers
Abstract summary: This paper explores the usability of BERT (a Transformer model for word embedding) for gender prediction on social media. A Dutch BERT model is fine-tuned on different samples of a Dutch Twitter dataset labeled for gender, varying in the number of tweets used per person. Results show that even with relatively small amounts of data, BERT can be fine-tuned to accurately help predict the gender of Twitter users.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer models have shown impressive performance on a variety of NLP tasks. Off-the-shelf, pre-trained models can be fine-tuned for specific NLP classification tasks, reducing the need for large amounts of additional training data. However, little research has addressed how much data is required to accurately fine-tune such pre-trained transformer models, and how much data is needed for accurate prediction. This paper explores the usability of BERT (a Transformer model for word embedding) for gender prediction on social media. Forensic applications include detecting gender obfuscation, e.g. males posing as females in chat rooms. A Dutch BERT model is fine-tuned on different samples of a Dutch Twitter dataset labeled for gender, varying in the number of tweets used per person. The results show that finetuning BERT contributes to good gender classification performance (80% F1) when finetuned on only 200 tweets per person. But when using just 20 tweets per person, the performance of our classifier deteriorates non-steeply (to 70% F1). These results show that even with relatively small amounts of data, BERT can be fine-tuned to accurately help predict the gender of Twitter users, and, consequently, that it is possible to determine gender on the basis of just a low volume of tweets. This opens up an operational perspective on the swift detection of gender.

Related papers

Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs) [82.57490175399693]
We study gender bias in 22 popular image-to-text vision-language assistants (VLAs) Our results show that VLAs replicate human biases likely present in the data, such as real-world occupational imbalances. To eliminate the gender bias in these models, we find that fine-tuning-based debiasing methods achieve the best trade-off between debiasing and retaining performance.
arXiv Detail & Related papers (2024-10-25T05:59:44Z)
Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender Perturbation over Fairytale Texts [87.62403265382734]
Recent studies show that traditional fairytales are rife with harmful gender biases. This work aims to assess learned biases of language models by evaluating their robustness against gender perturbations.
arXiv Detail & Related papers (2023-10-16T22:25:09Z)
The Impact of Debiasing on the Performance of Language Models in Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets. Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z)
The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender Characterisation in 55 Languages [51.2321117760104]
This paper describes the Gender-GAP Pipeline, an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages. The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text. We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation.
arXiv Detail & Related papers (2023-08-31T17:20:50Z)
Exploring Gender Bias in Retrieval Models [2.594412743115663]
Mitigating gender bias in information retrieval is important to avoid propagating stereotypes. We employ a dataset consisting of two components: (1) relevance of a document to a query and (2) "gender" of a document. We show that pre-trained models for IR do not perform well in zero-shot retrieval tasks when full fine-tuning of a large pre-trained BERT encoder is performed. We also illustrate that pre-trained models have gender biases that result in retrieved articles tending to be more often male than female.
arXiv Detail & Related papers (2022-08-02T21:12:05Z)
Twitter-Based Gender Recognition Using Transformers [2.539920413471809]
We propose a model based on transformers to predict the user's gender from their images and tweets. We fine-tune another model based on Bidirectional Representations from Transformers (ViTBERT) to recognize the user's gender by their tweets. The combination model improves the accuracy of image and text classification models by 6.98% and 4.43%, respectively.
arXiv Detail & Related papers (2022-04-24T19:58:42Z)
Improving Gender Fairness of Pre-Trained Language Models without Catastrophic Forgetting [88.83117372793737]
Forgetting information in the original training data may damage the model's downstream performance by a large margin. We propose GEnder Equality Prompt (GEEP) to improve gender fairness of pre-trained models with less forgetting.
arXiv Detail & Related papers (2021-10-11T15:52:16Z)
Investigating Gender Bias in BERT [22.066477991442003]
We analyse the gender-bias it induces in five downstream tasks related to emotion and sentiment intensity prediction. We propose an algorithm that finds fine-grained gender directions, i.e., one primary direction for each BERT layer. Experiments show that removing embedding components in such directions achieves great success in reducing BERT-induced bias in the downstream tasks.
arXiv Detail & Related papers (2020-09-10T17:38:32Z)
Mitigating Gender Bias in Captioning Systems [56.25457065032423]
Most captioning models learn gender bias, leading to high gender prediction errors, especially for women. We propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence.
arXiv Detail & Related papers (2020-06-15T12:16:19Z)
Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text. We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.