Related papers: PANDORA Talks: Personality and Demographics on Reddit

PANDORA Talks: Personality and Demographics on Reddit

URL: http://arxiv.org/abs/2004.04460v3
Date: Tue, 8 Jun 2021 13:22:41 GMT
Title: PANDORA Talks: Personality and Demographics on Reddit
Authors: Matej Gjurkovi\'c, Mladen Karan, Iva Vukojevi\'c, Mihaela Bo\v{s}njak, Jan \v{S}najder
Abstract summary: We present PANDORA, the first large-scale dataset of Reddit comments labeled with three personality models and demographics for more than 10k users. We showcase the usefulness of this dataset on three experiments, where we leverage the more readily available data to predict the Big 5 traits. We present benchmark prediction models for all personality and demographic variables.
Score: 2.4149105714758545
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Personality and demographics are important variables in social sciences, while in NLP they can aid in interpretability and removal of societal biases. However, datasets with both personality and demographic labels are scarce. To address this, we present PANDORA, the first large-scale dataset of Reddit comments labeled with three personality models (including the well-established Big 5 model) and demographics (age, gender, and location) for more than 10k users. We showcase the usefulness of this dataset on three experiments, where we leverage the more readily available data from other personality models to predict the Big 5 traits, analyze gender classification biases arising from psycho-demographic variables, and carry out a confirmatory and exploratory analysis based on psychological theories. Finally, we present benchmark prediction models for all personality and demographic variables.

Related papers

Investigating Gender Bias in LLM-Generated Stories via Psychological Stereotypes [8.091664636677637]
We investigate gender bias in Large Language Models (LLMs) using gender stereotypes studied in psychology.<n>We introduce a novel dataset called StereoBias-Stories containing short stories either unconditioned or conditioned on (one, two, or six) random attributes from 25 psychological stereotypes.<n>We analyze how the gender contribution in the overall story changes in response to these attributes and present three key findings.
arXiv Detail & Related papers (2025-08-05T10:10:26Z)
Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models [49.41113560646115]
We investigate various proxy measures of bias in large language models (LLMs)<n>We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores.<n>With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle.
arXiv Detail & Related papers (2025-06-12T08:47:40Z)
Big5PersonalityEssays: Introducing a Novel Synthetic Generated Dataset Consisting of Short State-of-Consciousness Essays Annotated Based on the Five Factor Model of Personality [0.0]
Psychology has been, in recent years, poorly approached using novel computational tools. This study introduces a synthethic database of short essays labeled based on the five factor model (FFM) of personality traits.
arXiv Detail & Related papers (2024-05-22T10:10:20Z)
Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks [15.015148115215315]
We conduct experiments on four popular large language models (LLMs) to investigate their capability to understand group differences and potential biases in their predictions for politeness and offensiveness. We find that for both tasks, model predictions are closer to the labels from White and female participants. More specifically, when being prompted to respond from the perspective of "Black" and "Asian" individuals, models show lower performance in predicting both overall scores as well as the scores from corresponding groups.
arXiv Detail & Related papers (2023-11-16T10:02:24Z)
Aligning Large Language Models with Human Opinions through Persona Selection and Value--Belief--Norm Reasoning [67.33899440998175]
Chain-of-Opinion (COO) is a simple four-step solution modeling which and how to reason with personae. COO distinguishes between explicit personae (demographics and ideology) and implicit personae (historical opinions) COO efficiently achieves new state-of-the-art opinion prediction via prompting with only 5 inference calls, improving prior techniques by up to 4%.
arXiv Detail & Related papers (2023-11-14T18:48:27Z)
On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented. Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z)
Editing Personality for Large Language Models [73.59001811199823]
This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs) We construct PersonalityEdit, a new benchmark dataset to address this task.
arXiv Detail & Related papers (2023-10-03T16:02:36Z)
Personality Profiling: How informative are social media profiles in predicting personal information? [0.04096453902709291]
We explore the extent to which peoples' online digital footprints can be used to profile their Myers-Briggs personality type. We compare four models: logistic regression, naive Bayes, support vector machines (SVMs) and random forests. A SVM model achieves the best accuracy of 20.95% for predicting a complete personality type.
arXiv Detail & Related papers (2023-09-15T03:09:43Z)
Large Language Models Can Infer Psychological Dispositions of Social Media Users [1.0923877073891446]
We test whether GPT-3.5 and GPT-4 can derive the Big Five personality traits from users' Facebook status updates in a zero-shot learning scenario. Our results show an average correlation of r =.29 (range = [.22,.33]) between LLM-inferred and self-reported trait scores. predictions were found to be more accurate for women and younger individuals on several traits, suggesting a potential bias stemming from the underlying training data or differences in online self-expression.
arXiv Detail & Related papers (2023-09-13T01:27:48Z)
Gender Stereotyping Impact in Facial Expression Recognition [1.5340540198612824]
In recent years, machine learning-based models have become the most popular approach to Facial Expression Recognition (FER) In publicly available FER datasets, apparent gender representation is usually mostly balanced, but their representation in the individual label is not. We generate derivative datasets with different amounts of stereotypical bias by altering the gender proportions of certain labels. We observe a discrepancy in the recognition of certain emotions between genders of up to $29 %$ under the worst bias conditions.
arXiv Detail & Related papers (2022-10-11T10:52:23Z)
Two-Faced Humans on Twitter and Facebook: Harvesting Social Multimedia for Human Personality Profiling [74.83957286553924]
We infer the Myers-Briggs Personality Type indicators by applying a novel multi-view fusion framework, called "PERS" Our experimental results demonstrate the PERS's ability to learn from multi-view data for personality profiling by efficiently leveraging on the significantly different data arriving from diverse social multimedia sources.
arXiv Detail & Related papers (2021-06-20T10:48:49Z)
My tweets bring all the traits to the yard: Predicting personality and relational traits in Online Social Networks [4.095574580512599]
This study aims to provide a prediction model for a holistic personality profiling in Online Social Networks (OSNs) We first designed a feature engineering methodology that extracts a wide range of features from OSN accounts of users. Then, we designed a machine learning model that predicts scores for the psychological traits of the users based on the extracted features.
arXiv Detail & Related papers (2020-09-22T20:30:56Z)
Vyaktitv: A Multimodal Peer-to-Peer Hindi Conversations based Dataset for Personality Assessment [50.15466026089435]
We present a novel peer-to-peer Hindi conversation dataset- Vyaktitv. It consists of high-quality audio and video recordings of the participants, with Hinglish textual transcriptions for each conversation. The dataset also contains a rich set of socio-demographic features, like income, cultural orientation, amongst several others, for all the participants.
arXiv Detail & Related papers (2020-08-31T17:44:28Z)
REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets [64.76453161039973]
REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset. It surfacing potential biases along three dimensions: (1) object-based, (2) person-based, and (3) geography-based.
arXiv Detail & Related papers (2020-04-16T23:54:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.