Related papers: Large-scale Gender/Age Prediction of Tumblr Users

Large-scale Gender/Age Prediction of Tumblr Users

URL: http://arxiv.org/abs/2001.00594v1
Date: Thu, 2 Jan 2020 19:01:45 GMT
Title: Large-scale Gender/Age Prediction of Tumblr Users
Authors: Yao Zhan, Changwei Hu, Yifan Hu, Tejaswi Kasturi, Shanmugam Ramasamy, Matt Gillingham, Keith Yamamoto
Abstract summary: We propose graph based and deep learning models for age and gender predictions. For graph based models, we come up with two approaches, network embedding and label propagation, to generate connection features. For deep learning models, we leverage convolutional neural network (CNN) and multilayer perceptron (MLP) to prediction users' age and gender.
Score: 5.063421139422184
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tumblr, as a leading content provider and social media, attracts 371 million monthly visits, 280 million blogs and 53.3 million daily posts. The popularity of Tumblr provides great opportunities for advertisers to promote their products through sponsored posts. However, it is a challenging task to target specific demographic groups for ads, since Tumblr does not require user information like gender and ages during their registration. Hence, to promote ad targeting, it is essential to predict user's demography using rich content such as posts, images and social connections. In this paper, we propose graph based and deep learning models for age and gender predictions, which take into account user activities and content features. For graph based models, we come up with two approaches, network embedding and label propagation, to generate connection features as well as directly infer user's demography. For deep learning models, we leverage convolutional neural network (CNN) and multilayer perceptron (MLP) to prediction users' age and gender. Experimental results on real Tumblr daily dataset, with hundreds of millions of active users and billions of following relations, demonstrate that our approaches significantly outperform the baseline model, by improving the accuracy relatively by 81% for age, and the AUC and accuracy by 5\% for gender.

Related papers

Adultification Bias in LLMs and Text-to-Image Models [55.02903075972816]
We study bias along axes of race and gender in young girls.<n>We focus on "adultification bias," a phenomenon in which Black girls are presumed to be more defiant, sexually intimate, and culpable than their White peers.
arXiv Detail & Related papers (2025-06-08T21:02:33Z)
On the Inference of Sociodemographics on Reddit [5.524795406792588]
We use a novel data set of more than 850k self-declarations on age, gender, and partisan affiliation from Reddit comments. We do so on two tasks: ($i$) predicting binary labels (classification); and ($ii$)predicting the prevalence of a demographic class among a set of users.
arXiv Detail & Related papers (2025-02-07T16:11:39Z)
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs) [82.57490175399693]
We study gender bias in 22 popular image-to-text vision-language assistants (VLAs) Our results show that VLAs replicate human biases likely present in the data, such as real-world occupational imbalances. To eliminate the gender bias in these models, we find that fine-tuning-based debiasing methods achieve the best trade-off between debiasing and retaining performance.
arXiv Detail & Related papers (2024-10-25T05:59:44Z)
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models. Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs. Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z)
DADIT: A Dataset for Demographic Classification of Italian Twitter Users and a Comparison of Prediction Methods [20.590525489367955]
We construct, validate, and release publicly the representative DADIT dataset of 30M tweets of 20k Italian Twitter users. DADIT enables us to train and compare the performance of various state-of-the-art models for the prediction of the gender and age of social media users.
arXiv Detail & Related papers (2024-03-08T22:18:13Z)
Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models. By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes. We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z)
Twitter-Based Gender Recognition Using Transformers [2.539920413471809]
We propose a model based on transformers to predict the user's gender from their images and tweets. We fine-tune another model based on Bidirectional Representations from Transformers (ViTBERT) to recognize the user's gender by their tweets. The combination model improves the accuracy of image and text classification models by 6.98% and 4.43%, respectively.
arXiv Detail & Related papers (2022-04-24T19:58:42Z)
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models. First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding. Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z)
What's in a Name? -- Gender Classification of Names with Character Based Machine Learning Models [6.805167389805055]
We consider the problem of predicting the gender of registered users based on their declared name. By analyzing the first names of 100M+ users, we found that genders can be very effectively classified using the composition of the name strings.
arXiv Detail & Related papers (2021-02-07T01:01:32Z)
Gender prediction using limited Twitter Data [0.0]
This paper explores the usability of BERT (a Transformer model for word embedding) for gender prediction on social media. A Dutch BERT model is fine-tuned on different samples of a Dutch Twitter dataset labeled for gender, varying in the number of tweets used per person. Results show that even with relatively small amounts of data, BERT can be fine-tuned to accurately help predict the gender of Twitter users.
arXiv Detail & Related papers (2020-09-29T11:46:07Z)
Disentangled Graph Collaborative Filtering [100.26835145396782]
Disentangled Graph Collaborative Filtering (DGCF) is a new model for learning informative representations of users and items from interaction data. By modeling a distribution over intents for each user-item interaction, we iteratively refine the intent-aware interaction graphs and representations. DGCF achieves significant improvements over several state-of-the-art models like NGCF, DisenGCN, and MacridVAE.
arXiv Detail & Related papers (2020-07-03T15:37:25Z)
Mitigating Gender Bias in Captioning Systems [56.25457065032423]
Most captioning models learn gender bias, leading to high gender prediction errors, especially for women. We propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence.
arXiv Detail & Related papers (2020-06-15T12:16:19Z)
Investigating Bias in Deep Face Analysis: The KANFace Dataset and Empirical Study [67.3961439193994]
We introduce the most comprehensive, large-scale dataset of facial images and videos to date. The data are manually annotated in terms of identity, exact age, gender and kinship. A method to debias network embeddings is introduced and tested on the proposed benchmarks.
arXiv Detail & Related papers (2020-05-15T00:14:39Z)
PANDORA Talks: Personality and Demographics on Reddit [2.4149105714758545]
We present PANDORA, the first large-scale dataset of Reddit comments labeled with three personality models and demographics for more than 10k users. We showcase the usefulness of this dataset on three experiments, where we leverage the more readily available data to predict the Big 5 traits. We present benchmark prediction models for all personality and demographic variables.
arXiv Detail & Related papers (2020-04-09T10:08:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.