A Computational Framework for Interpretable Text-Based Personality Assessment from Social Media
- URL: http://arxiv.org/abs/2510.02811v1
- Date: Fri, 03 Oct 2025 08:36:36 GMT
- Title: A Computational Framework for Interpretable Text-Based Personality Assessment from Social Media
- Authors: Matej Gjurković,
- Abstract summary: This thesis presents two datasets -- MBTI9k and PANDORA -- collected from Reddit.<n>The PANDORA dataset contains 17 million comments from over 10,000 users.<n>In response, the SIMPA framework was developed - a computational framework for interpretable personality assessment.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Personality refers to individual differences in behavior, thinking, and feeling. With the growing availability of digital footprints, especially from social media, automated methods for personality assessment have become increasingly important. Natural language processing (NLP) enables the analysis of unstructured text data to identify personality indicators. However, two main challenges remain central to this thesis: the scarcity of large, personality-labeled datasets and the disconnect between personality psychology and NLP, which restricts model validity and interpretability. To address these challenges, this thesis presents two datasets -- MBTI9k and PANDORA -- collected from Reddit, a platform known for user anonymity and diverse discussions. The PANDORA dataset contains 17 million comments from over 10,000 users and integrates the MBTI and Big Five personality models with demographic information, overcoming limitations in data size, quality, and label coverage. Experiments on these datasets show that demographic variables influence model validity. In response, the SIMPA (Statement-to-Item Matching Personality Assessment) framework was developed - a computational framework for interpretable personality assessment that matches user-generated statements with validated questionnaire items. By using machine learning and semantic similarity, SIMPA delivers personality assessments comparable to human evaluations while maintaining high interpretability and efficiency. Although focused on personality assessment, SIMPA's versatility extends beyond this domain. Its model-agnostic design, layered cue detection, and scalability make it suitable for various research and practical applications involving complex label taxonomies and variable cue associations with target concepts.
Related papers
- Ask, Answer, and Detect: Role-Playing LLMs for Personality Detection with Question-Conditioned Mixture-of-Experts [4.618735978506653]
ROME is a novel framework that explicitly injects psychological knowledge into personality detection.<n>We show that ROME consistently outperforms state-of-the-art baselines in experiments on two real-world datasets.
arXiv Detail & Related papers (2025-12-09T17:07:54Z) - Exploring a Gamified Personality Assessment Method through Interaction with LLM Agents Embodying Different Personalities [45.56431615835303]
This study explores an interactive approach for personality assessment, focusing on the multiplicity of personality representation.<n>We propose a framework of Gamified Personality Assessment through Multi-Personality Representations (Multi-PR GPA)
arXiv Detail & Related papers (2025-07-05T11:17:20Z) - A Chinese Multi-label Affective Computing Dataset Based on Social Media Network Users [2.0209172586699173]
This study collected data from the major social media platform Weibo, screening 11,338 valid users from over 50,000 individuals with diverse MBTI personality labels.
We compiled a multi-label Chinese affective computing dataset that integrates the same user's personality traits with six emotions and micro-emotions, each annotated with intensity levels.
This dataset is designed to advance machine recognition of complex human emotions and provide data support for research in psychology, education, marketing, finance, and politics.
arXiv Detail & Related papers (2024-11-13T05:38:55Z) - Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction [54.23208041792073]
Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review.
A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods.
We propose a self-training framework with a pseudo-label scorer, wherein a scorer assesses the match between reviews and their pseudo-labels.
arXiv Detail & Related papers (2024-06-26T05:30:21Z) - Evaluating Large Language Models with Psychometrics [59.821829073478376]
This paper offers a comprehensive benchmark for quantifying psychological constructs of Large Language Models (LLMs)<n>Our work identifies five key psychological constructs -- personality, values, emotional intelligence, theory of mind, and self-efficacy -- assessed through a suite of 13 datasets.<n>We uncover significant discrepancies between LLMs' self-reported traits and their response patterns in real-world scenarios, revealing complexities in their behaviors.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - Can ChatGPT Read Who You Are? [10.577227353680994]
We report the results of a comprehensive user study featuring texts written in Czech by a representative population sample of 155 participants.
We compare the personality trait estimations made by ChatGPT against those by human raters and report ChatGPT's competitive performance in inferring personality traits from text.
arXiv Detail & Related papers (2023-12-26T14:43:04Z) - PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for
Personality Detection [50.66968526809069]
We propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner.
Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection.
arXiv Detail & Related papers (2023-10-31T08:23:33Z) - Editing Personality for Large Language Models [73.59001811199823]
This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs)
We construct PersonalityEdit, a new benchmark dataset to address this task.
arXiv Detail & Related papers (2023-10-03T16:02:36Z) - Two-Faced Humans on Twitter and Facebook: Harvesting Social Multimedia
for Human Personality Profiling [74.83957286553924]
We infer the Myers-Briggs Personality Type indicators by applying a novel multi-view fusion framework, called "PERS"
Our experimental results demonstrate the PERS's ability to learn from multi-view data for personality profiling by efficiently leveraging on the significantly different data arriving from diverse social multimedia sources.
arXiv Detail & Related papers (2021-06-20T10:48:49Z) - Vyaktitv: A Multimodal Peer-to-Peer Hindi Conversations based Dataset
for Personality Assessment [50.15466026089435]
We present a novel peer-to-peer Hindi conversation dataset- Vyaktitv.
It consists of high-quality audio and video recordings of the participants, with Hinglish textual transcriptions for each conversation.
The dataset also contains a rich set of socio-demographic features, like income, cultural orientation, amongst several others, for all the participants.
arXiv Detail & Related papers (2020-08-31T17:44:28Z) - Representation Learning on Variable Length and Incomplete
Wearable-Sensory Time Series [29.061466414756925]
HeartSpace encodes time series data with variable-length and missing values via the integration of a time series encoding module and a pattern aggregation network.
HeartSpace implements a Siamese-triplet network to optimize representations by jointly capturing intra- and inter-series correlations.
The empirical evaluation over two different real-world data presents significant performance gains overstate-of-the-art baselines in a variety of applications.
arXiv Detail & Related papers (2020-02-10T08:20:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.