Investigating Large Language Models in Inferring Personality Traits from User Conversations
- URL: http://arxiv.org/abs/2501.07532v1
- Date: Mon, 13 Jan 2025 18:09:58 GMT
- Title: Investigating Large Language Models in Inferring Personality Traits from User Conversations
- Authors: Jianfeng Zhu, Ruoming Jin, Karin G. Coifman,
- Abstract summary: Large Language Models (LLMs) are demonstrating remarkable human like capabilities across diverse domains.
This study evaluates whether GPT-4o and GPT-4o mini, can infer Big Five personality traits and generate BFI-10 item scores from user conversations.
- Score: 5.705775078773656
- License:
- Abstract: Large Language Models (LLMs) are demonstrating remarkable human like capabilities across diverse domains, including psychological assessment. This study evaluates whether LLMs, specifically GPT-4o and GPT-4o mini, can infer Big Five personality traits and generate Big Five Inventory-10 (BFI-10) item scores from user conversations under zero-shot prompting conditions. Our findings reveal that incorporating an intermediate step--prompting for BFI-10 item scores before calculating traits--enhances accuracy and aligns more closely with the gold standard than direct trait inference. This structured approach underscores the importance of leveraging psychological frameworks in improving predictive precision. Additionally, a group comparison based on depressive symptom presence revealed differential model performance. Participants were categorized into two groups: those experiencing at least one depressive symptom and those without symptoms. GPT-4o mini demonstrated heightened sensitivity to depression-related shifts in traits such as Neuroticism and Conscientiousness within the symptom-present group, whereas GPT-4o exhibited strengths in nuanced interpretation across groups. These findings underscore the potential of LLMs to analyze real-world psychological data effectively, offering a valuable foundation for interdisciplinary research at the intersection of artificial intelligence and psychology.
Related papers
- LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.
We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.
Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z) - GPT-4 on Clinic Depression Assessment: An LLM-Based Pilot Study [0.6999740786886538]
We explore the use of GPT-4 for clinical depression assessment based on transcript analysis.
We examine the model's ability to classify patient interviews into binary categories: depressed and not depressed.
Results indicate that GPT-4 exhibits considerable variability in accuracy and F1-Score across configurations.
arXiv Detail & Related papers (2024-12-31T00:32:43Z) - Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models [57.518784855080334]
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants.
This paper presents a framework for investigating psychology dimension in LLMs, including psychological identification, assessment dataset curation, and assessment with results validation.
We introduce a comprehensive psychometrics benchmark for LLMs that covers six psychological dimensions: personality, values, emotion, theory of mind, motivation, and intelligence.
arXiv Detail & Related papers (2024-06-25T16:09:08Z) - Large Language Models Can Infer Personality from Free-Form User Interactions [0.0]
GPT-4 can infer personality with moderate accuracy, outperforming previous approaches.
Results show that the direct focus on personality assessment did not result in a less positive user experience.
Preliminary analyses suggest that the accuracy of personality inferences varies only marginally across different socio-demographic subgroups.
arXiv Detail & Related papers (2024-05-19T20:33:36Z) - Comparing the Efficacy of GPT-4 and Chat-GPT in Mental Health Care: A Blind Assessment of Large Language Models for Psychological Support [0.0]
Two large language models, GPT-4 and Chat-GPT, were tested in responding to a set of 18 psychological prompts.
GPT-4 achieved an average rating of 8.29 out of 10, while Chat-GPT received an average rating of 6.52.
arXiv Detail & Related papers (2024-05-15T12:44:54Z) - Illuminate: A novel approach for depression detection with explainable
analysis and proactive therapy using prompt engineering [0.0]
This paper introduces a novel paradigm for depression detection and treatment using advanced Large Language Models (LLMs): Generative Pre-trained Transformer 4 (GPT-4), Llama 2 chat, and Gemini.
LLMs are fine-tuned with specialized prompts to diagnose, explain, and suggest therapeutic interventions for depression.
arXiv Detail & Related papers (2024-02-05T06:08:06Z) - PsyCoT: Psychological Questionnaire as Powerful Chain-of-Thought for
Personality Detection [50.66968526809069]
We propose a novel personality detection method, called PsyCoT, which mimics the way individuals complete psychological questionnaires in a multi-turn dialogue manner.
Our experiments demonstrate that PsyCoT significantly improves the performance and robustness of GPT-3.5 in personality detection.
arXiv Detail & Related papers (2023-10-31T08:23:33Z) - Large Language Models Can Infer Psychological Dispositions of Social Media Users [1.0923877073891446]
We test whether GPT-3.5 and GPT-4 can derive the Big Five personality traits from users' Facebook status updates in a zero-shot learning scenario.
Our results show an average correlation of r =.29 (range = [.22,.33]) between LLM-inferred and self-reported trait scores.
predictions were found to be more accurate for women and younger individuals on several traits, suggesting a potential bias stemming from the underlying training data or differences in online self-expression.
arXiv Detail & Related papers (2023-09-13T01:27:48Z) - Revisiting the Reliability of Psychological Scales on Large Language Models [62.57981196992073]
This study aims to determine the reliability of applying personality assessments to Large Language Models.
Analysis of 2,500 settings per model, including GPT-3.5, GPT-4, Gemini-Pro, and LLaMA-3.1, reveals that various LLMs show consistency in responses to the Big Five Inventory.
arXiv Detail & Related papers (2023-05-31T15:03:28Z) - Handwriting and Drawing for Depression Detection: A Preliminary Study [53.11777541341063]
Short-term covid effects on mental health were a significant increase in anxiety and depressive symptoms.
The aim of this study is to use a new tool, the online handwriting and drawing analysis, to discriminate between healthy individuals and depressed patients.
arXiv Detail & Related papers (2023-02-05T22:33:49Z) - Evaluating Psychological Safety of Large Language Models [72.88260608425949]
We designed unbiased prompts to evaluate the psychological safety of large language models (LLMs)
We tested five different LLMs by using two personality tests: Short Dark Triad (SD-3) and Big Five Inventory (BFI)
Despite being instruction fine-tuned with safety metrics to reduce toxicity, InstructGPT, GPT-3.5, and GPT-4 still showed dark personality patterns.
Fine-tuning Llama-2-chat-7B with responses from BFI using direct preference optimization could effectively reduce the psychological toxicity of the model.
arXiv Detail & Related papers (2022-12-20T18:45:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.