Twitter-Based Gender Recognition Using Transformers
- URL: http://arxiv.org/abs/2205.06801v1
- Date: Sun, 24 Apr 2022 19:58:42 GMT
- Title: Twitter-Based Gender Recognition Using Transformers
- Authors: Zahra Movahedi Nia, Ali Ahmadi, Bruce Mellado, Jianhong Wu, James
Orbinski, Ali Agary, Jude Dzevela Kong
- Abstract summary: We propose a model based on transformers to predict the user's gender from their images and tweets.
We fine-tune another model based on Bidirectional Representations from Transformers (ViTBERT) to recognize the user's gender by their tweets.
The combination model improves the accuracy of image and text classification models by 6.98% and 4.43%, respectively.
- Score: 2.539920413471809
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Social media contains useful information about people and the society that
could help advance research in many different areas (e.g. by applying opinion
mining, emotion/sentiment analysis, and statistical analysis) such as business
and finance, health, socio-economic inequality and gender vulnerability. User
demographics provide rich information that could help study the subject
further. However, user demographics such as gender are considered private and
are not freely available. In this study, we propose a model based on
transformers to predict the user's gender from their images and tweets. We
fine-tune a model based on Vision Transformers (ViT) to stratify female and
male images. Next, we fine-tune another model based on Bidirectional Encoders
Representations from Transformers (BERT) to recognize the user's gender by
their tweets. This is highly beneficial, because not all users provide an image
that indicates their gender. The gender of such users could be detected form
their tweets. The combination model improves the accuracy of image and text
classification models by 6.98% and 4.43%, respectively. This shows that the
image and text classification models are capable of complementing each other by
providing additional information to one another. We apply our method to the
PAN-2018 dataset, and obtain an accuracy of 85.52%.
Related papers
- The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender
Characterisation in 55 Languages [51.2321117760104]
This paper describes the Gender-GAP Pipeline, an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages.
The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text.
We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation.
arXiv Detail & Related papers (2023-08-31T17:20:50Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Stereotypes and Smut: The (Mis)representation of Non-cisgender
Identities by Text-to-Image Models [6.92043136971035]
We investigate how multimodal models handle diverse gender identities.
We find certain non-cisgender identities are consistently (mis)represented as less human, more stereotyped and more sexualised.
These improvements could pave the way for a future where change is led by the affected community.
arXiv Detail & Related papers (2023-05-26T16:28:49Z) - Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models.
By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes.
We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z) - Gender Artifacts in Visual Datasets [34.74191865400569]
We investigate what $textitgender artifacts$ exist within large-scale visual datasets.
We find that gender artifacts are ubiquitous in the COCO and OpenImages datasets.
We claim that attempts to remove gender artifacts from such datasets are largely infeasible.
arXiv Detail & Related papers (2022-06-18T12:09:19Z) - DALL-Eval: Probing the Reasoning Skills and Social Biases of
Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models.
First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding.
Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z) - Gender prediction using limited Twitter Data [0.0]
This paper explores the usability of BERT (a Transformer model for word embedding) for gender prediction on social media.
A Dutch BERT model is fine-tuned on different samples of a Dutch Twitter dataset labeled for gender, varying in the number of tweets used per person.
Results show that even with relatively small amounts of data, BERT can be fine-tuned to accurately help predict the gender of Twitter users.
arXiv Detail & Related papers (2020-09-29T11:46:07Z) - Mitigating Gender Bias in Captioning Systems [56.25457065032423]
Most captioning models learn gender bias, leading to high gender prediction errors, especially for women.
We propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence.
arXiv Detail & Related papers (2020-06-15T12:16:19Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z) - Large-scale Gender/Age Prediction of Tumblr Users [5.063421139422184]
We propose graph based and deep learning models for age and gender predictions.
For graph based models, we come up with two approaches, network embedding and label propagation, to generate connection features.
For deep learning models, we leverage convolutional neural network (CNN) and multilayer perceptron (MLP) to prediction users' age and gender.
arXiv Detail & Related papers (2020-01-02T19:01:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.