Large-scale Gender/Age Prediction of Tumblr Users
- URL: http://arxiv.org/abs/2001.00594v1
- Date: Thu, 2 Jan 2020 19:01:45 GMT
- Title: Large-scale Gender/Age Prediction of Tumblr Users
- Authors: Yao Zhan, Changwei Hu, Yifan Hu, Tejaswi Kasturi, Shanmugam Ramasamy,
Matt Gillingham, Keith Yamamoto
- Abstract summary: We propose graph based and deep learning models for age and gender predictions.
For graph based models, we come up with two approaches, network embedding and label propagation, to generate connection features.
For deep learning models, we leverage convolutional neural network (CNN) and multilayer perceptron (MLP) to prediction users' age and gender.
- Score: 5.063421139422184
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tumblr, as a leading content provider and social media, attracts 371 million
monthly visits, 280 million blogs and 53.3 million daily posts. The popularity
of Tumblr provides great opportunities for advertisers to promote their
products through sponsored posts. However, it is a challenging task to target
specific demographic groups for ads, since Tumblr does not require user
information like gender and ages during their registration. Hence, to promote
ad targeting, it is essential to predict user's demography using rich content
such as posts, images and social connections. In this paper, we propose graph
based and deep learning models for age and gender predictions, which take into
account user activities and content features. For graph based models, we come
up with two approaches, network embedding and label propagation, to generate
connection features as well as directly infer user's demography. For deep
learning models, we leverage convolutional neural network (CNN) and multilayer
perceptron (MLP) to prediction users' age and gender. Experimental results on
real Tumblr daily dataset, with hundreds of millions of active users and
billions of following relations, demonstrate that our approaches significantly
outperform the baseline model, by improving the accuracy relatively by 81% for
age, and the AUC and accuracy by 5\% for gender.
Related papers
- GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - DADIT: A Dataset for Demographic Classification of Italian Twitter Users
and a Comparison of Prediction Methods [20.590525489367955]
We construct, validate, and release publicly the representative DADIT dataset of 30M tweets of 20k Italian Twitter users.
DADIT enables us to train and compare the performance of various state-of-the-art models for the prediction of the gender and age of social media users.
arXiv Detail & Related papers (2024-03-08T22:18:13Z) - Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models.
By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes.
We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z) - Twitter-Based Gender Recognition Using Transformers [2.539920413471809]
We propose a model based on transformers to predict the user's gender from their images and tweets.
We fine-tune another model based on Bidirectional Representations from Transformers (ViTBERT) to recognize the user's gender by their tweets.
The combination model improves the accuracy of image and text classification models by 6.98% and 4.43%, respectively.
arXiv Detail & Related papers (2022-04-24T19:58:42Z) - DALL-Eval: Probing the Reasoning Skills and Social Biases of
Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models.
First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding.
Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z) - What's in a Name? -- Gender Classification of Names with Character Based
Machine Learning Models [6.805167389805055]
We consider the problem of predicting the gender of registered users based on their declared name.
By analyzing the first names of 100M+ users, we found that genders can be very effectively classified using the composition of the name strings.
arXiv Detail & Related papers (2021-02-07T01:01:32Z) - Gender prediction using limited Twitter Data [0.0]
This paper explores the usability of BERT (a Transformer model for word embedding) for gender prediction on social media.
A Dutch BERT model is fine-tuned on different samples of a Dutch Twitter dataset labeled for gender, varying in the number of tweets used per person.
Results show that even with relatively small amounts of data, BERT can be fine-tuned to accurately help predict the gender of Twitter users.
arXiv Detail & Related papers (2020-09-29T11:46:07Z) - Disentangled Graph Collaborative Filtering [100.26835145396782]
Disentangled Graph Collaborative Filtering (DGCF) is a new model for learning informative representations of users and items from interaction data.
By modeling a distribution over intents for each user-item interaction, we iteratively refine the intent-aware interaction graphs and representations.
DGCF achieves significant improvements over several state-of-the-art models like NGCF, DisenGCN, and MacridVAE.
arXiv Detail & Related papers (2020-07-03T15:37:25Z) - Mitigating Gender Bias in Captioning Systems [56.25457065032423]
Most captioning models learn gender bias, leading to high gender prediction errors, especially for women.
We propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence.
arXiv Detail & Related papers (2020-06-15T12:16:19Z) - Investigating Bias in Deep Face Analysis: The KANFace Dataset and
Empirical Study [67.3961439193994]
We introduce the most comprehensive, large-scale dataset of facial images and videos to date.
The data are manually annotated in terms of identity, exact age, gender and kinship.
A method to debias network embeddings is introduced and tested on the proposed benchmarks.
arXiv Detail & Related papers (2020-05-15T00:14:39Z) - PANDORA Talks: Personality and Demographics on Reddit [2.4149105714758545]
We present PANDORA, the first large-scale dataset of Reddit comments labeled with three personality models and demographics for more than 10k users.
We showcase the usefulness of this dataset on three experiments, where we leverage the more readily available data to predict the Big 5 traits.
We present benchmark prediction models for all personality and demographic variables.
arXiv Detail & Related papers (2020-04-09T10:08:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.