PANDORA Talks: Personality and Demographics on Reddit
- URL: http://arxiv.org/abs/2004.04460v3
- Date: Tue, 8 Jun 2021 13:22:41 GMT
- Title: PANDORA Talks: Personality and Demographics on Reddit
- Authors: Matej Gjurkovi\'c, Mladen Karan, Iva Vukojevi\'c, Mihaela Bo\v{s}njak,
Jan \v{S}najder
- Abstract summary: We present PANDORA, the first large-scale dataset of Reddit comments labeled with three personality models and demographics for more than 10k users.
We showcase the usefulness of this dataset on three experiments, where we leverage the more readily available data to predict the Big 5 traits.
We present benchmark prediction models for all personality and demographic variables.
- Score: 2.4149105714758545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personality and demographics are important variables in social sciences,
while in NLP they can aid in interpretability and removal of societal biases.
However, datasets with both personality and demographic labels are scarce. To
address this, we present PANDORA, the first large-scale dataset of Reddit
comments labeled with three personality models (including the well-established
Big 5 model) and demographics (age, gender, and location) for more than 10k
users. We showcase the usefulness of this dataset on three experiments, where
we leverage the more readily available data from other personality models to
predict the Big 5 traits, analyze gender classification biases arising from
psycho-demographic variables, and carry out a confirmatory and exploratory
analysis based on psychological theories. Finally, we present benchmark
prediction models for all personality and demographic variables.
Related papers
- Aligning with Whom? Large Language Models Have Gender and Racial Biases
in Subjective NLP Tasks [15.015148115215315]
We conduct experiments on four popular large language models (LLMs) to investigate their capability to understand group differences and potential biases in their predictions for politeness and offensiveness.
We find that for both tasks, model predictions are closer to the labels from White and female participants.
More specifically, when being prompted to respond from the perspective of "Black" and "Asian" individuals, models show lower performance in predicting both overall scores as well as the scores from corresponding groups.
arXiv Detail & Related papers (2023-11-16T10:02:24Z) - On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented.
Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z) - Editing Personality for Large Language Models [73.59001811199823]
This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs)
We construct a new benchmark dataset PersonalityEdit to address this task.
arXiv Detail & Related papers (2023-10-03T16:02:36Z) - Personality Profiling: How informative are social media profiles in
predicting personal information? [0.046040036610482664]
Personality profiling has been utilised by companies for targeted advertising, political campaigns and vaccine campaigns.
We aim to explore the extent to which peoples' online digital footprints can be used to profile their Myers-Briggs personality type.
arXiv Detail & Related papers (2023-09-15T03:09:43Z) - Large Language Models Can Infer Psychological Dispositions of Social Media Users [1.0923877073891446]
We test whether GPT-3.5 and GPT-4 can derive the Big Five personality traits from users' Facebook status updates in a zero-shot learning scenario.
Our results show an average correlation of r =.29 (range = [.22,.33]) between LLM-inferred and self-reported trait scores.
predictions were found to be more accurate for women and younger individuals on several traits, suggesting a potential bias stemming from the underlying training data or differences in online self-expression.
arXiv Detail & Related papers (2023-09-13T01:27:48Z) - Stable Bias: Analyzing Societal Representations in Diffusion Models [72.27121528451528]
We propose a new method for exploring the social biases in Text-to-Image (TTI) systems.
Our approach relies on characterizing the variation in generated images triggered by enumerating gender and ethnicity markers in the prompts.
We leverage this method to analyze images generated by 3 popular TTI systems and find that while all of their outputs show correlations with US labor demographics, they also consistently under-represent marginalized identities to different extents.
arXiv Detail & Related papers (2023-03-20T19:32:49Z) - Gender Stereotyping Impact in Facial Expression Recognition [1.5340540198612824]
In recent years, machine learning-based models have become the most popular approach to Facial Expression Recognition (FER)
In publicly available FER datasets, apparent gender representation is usually mostly balanced, but their representation in the individual label is not.
We generate derivative datasets with different amounts of stereotypical bias by altering the gender proportions of certain labels.
We observe a discrepancy in the recognition of certain emotions between genders of up to $29 %$ under the worst bias conditions.
arXiv Detail & Related papers (2022-10-11T10:52:23Z) - Two-Faced Humans on Twitter and Facebook: Harvesting Social Multimedia
for Human Personality Profiling [74.83957286553924]
We infer the Myers-Briggs Personality Type indicators by applying a novel multi-view fusion framework, called "PERS"
Our experimental results demonstrate the PERS's ability to learn from multi-view data for personality profiling by efficiently leveraging on the significantly different data arriving from diverse social multimedia sources.
arXiv Detail & Related papers (2021-06-20T10:48:49Z) - My tweets bring all the traits to the yard: Predicting personality and
relational traits in Online Social Networks [4.095574580512599]
This study aims to provide a prediction model for a holistic personality profiling in Online Social Networks (OSNs)
We first designed a feature engineering methodology that extracts a wide range of features from OSN accounts of users.
Then, we designed a machine learning model that predicts scores for the psychological traits of the users based on the extracted features.
arXiv Detail & Related papers (2020-09-22T20:30:56Z) - Vyaktitv: A Multimodal Peer-to-Peer Hindi Conversations based Dataset
for Personality Assessment [50.15466026089435]
We present a novel peer-to-peer Hindi conversation dataset- Vyaktitv.
It consists of high-quality audio and video recordings of the participants, with Hinglish textual transcriptions for each conversation.
The dataset also contains a rich set of socio-demographic features, like income, cultural orientation, amongst several others, for all the participants.
arXiv Detail & Related papers (2020-08-31T17:44:28Z) - REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets [64.76453161039973]
REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset.
It surfacing potential biases along three dimensions: (1) object-based, (2) person-based, and (3) geography-based.
arXiv Detail & Related papers (2020-04-16T23:54:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.