The Efficacy of Conversational Artificial Intelligence in Rectifying the Theory of Mind and Autonomy Biases: Comparative Analysis
- URL: http://arxiv.org/abs/2406.13813v5
- Date: Tue, 23 Jul 2024 12:33:32 GMT
- Title: The Efficacy of Conversational Artificial Intelligence in Rectifying the Theory of Mind and Autonomy Biases: Comparative Analysis
- Authors: Marcin Rządeczka, Anna Sterna, Julia Stolińska, Paulina Kaczyńska, Marcin Moskalewicz,
- Abstract summary: The increasing deployment of Conversational Artificial Intelligence (CAI) in mental health interventions necessitates an evaluation of their efficacy in rectifying cognitive biases and recognizing affect in human-AI interactions.
This study aimed to assess the effectiveness of therapeutic chatbots versus general-purpose language models (GPT-3.5, GPT-4, Gemini Pro) in identifying and rectifying cognitive biases and recognizing affect in user interactions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Background: The increasing deployment of Conversational Artificial Intelligence (CAI) in mental health interventions necessitates an evaluation of their efficacy in rectifying cognitive biases and recognizing affect in human-AI interactions. These biases, including theory of mind and autonomy biases, can exacerbate mental health conditions such as depression and anxiety. Objective: This study aimed to assess the effectiveness of therapeutic chatbots (Wysa, Youper) versus general-purpose language models (GPT-3.5, GPT-4, Gemini Pro) in identifying and rectifying cognitive biases and recognizing affect in user interactions. Methods: The study employed virtual case scenarios simulating typical user-bot interactions. Cognitive biases assessed included theory of mind biases (anthropomorphism, overtrust, attribution) and autonomy biases (illusion of control, fundamental attribution error, just-world hypothesis). Responses were evaluated on accuracy, therapeutic quality, and adherence to Cognitive Behavioral Therapy (CBT) principles, using an ordinal scale. The evaluation involved double review by cognitive scientists and a clinical psychologist. Results: The study revealed that general-purpose chatbots outperformed therapeutic chatbots in rectifying cognitive biases, particularly in overtrust bias, fundamental attribution error, and just-world hypothesis. GPT-4 achieved the highest scores across all biases, while therapeutic bots like Wysa scored the lowest. Affect recognition showed similar trends, with general-purpose bots outperforming therapeutic bots in four out of six biases. However, the results highlight the need for further refinement of therapeutic chatbots to enhance their efficacy and ensure safe, effective use in digital mental health interventions. Future research should focus on improving affective response and addressing ethical considerations in AI-based therapy.
Related papers
- CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy [67.23830698947637]
We propose a new benchmark, CBT-BENCH, for the systematic evaluation of cognitive behavioral therapy (CBT) assistance.
We include three levels of tasks in CBT-BENCH: I: Basic CBT knowledge acquisition, with the task of multiple-choice questions; II: Cognitive model understanding, with the tasks of cognitive distortion classification, primary core belief classification, and fine-grained core belief classification; III: Therapeutic response generation, with the task of generating responses to patient speech in CBT therapy sessions.
Experimental results indicate that while LLMs perform well in reciting CBT knowledge, they fall short in complex real-world scenarios
arXiv Detail & Related papers (2024-10-17T04:52:57Z) - ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models [53.00812898384698]
We argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking.
We highlight how cognitive biases can conflate fluent information and truthfulness, and how cognitive uncertainty affects the reliability of rating scores such as Likert.
We propose the ConSiDERS-The-Human evaluation framework consisting of 6 pillars -- Consistency, Scoring Criteria, Differentiating, User Experience, Responsible, and Scalability.
arXiv Detail & Related papers (2024-05-28T22:45:28Z) - AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts [27.240795549935463]
We gathered data from social media and established the task of extracting cognitive pathways.
We structured a text summarization task to help psychotherapists quickly grasp the essential information.
Our experiments evaluate the performance of deep learning and large language models.
arXiv Detail & Related papers (2024-04-17T14:55:27Z) - HealMe: Harnessing Cognitive Reframing in Large Language Models for Psychotherapy [25.908522131646258]
We unveil the Helping and Empowering through Adaptive Language in Mental Enhancement (HealMe) model.
This novel cognitive reframing therapy method effectively addresses deep-rooted negative thoughts and fosters rational, balanced perspectives.
We adopt the first comprehensive and expertly crafted psychological evaluation metrics, specifically designed to rigorously assess the performance of cognitive reframing.
arXiv Detail & Related papers (2024-02-26T09:10:34Z) - PsychoGAT: A Novel Psychological Measurement Paradigm through Interactive Fiction Games with LLM Agents [68.50571379012621]
Psychological measurement is essential for mental health, self-understanding, and personal development.
PsychoGAT (Psychological Game AgenTs) achieves statistically significant excellence in psychometric metrics such as reliability, convergent validity, and discriminant validity.
arXiv Detail & Related papers (2024-02-19T18:00:30Z) - Can ChatGPT Read Who You Are? [10.577227353680994]
We report the results of a comprehensive user study featuring texts written in Czech by a representative population sample of 155 participants.
We compare the personality trait estimations made by ChatGPT against those by human raters and report ChatGPT's competitive performance in inferring personality traits from text.
arXiv Detail & Related papers (2023-12-26T14:43:04Z) - Evaluating Subjective Cognitive Appraisals of Emotions from Large
Language Models [47.890846082224066]
This work fills the gap by presenting CovidET-Appraisals, the most comprehensive dataset to-date that assesses 24 appraisal dimensions.
CovidET-Appraisals presents an ideal testbed to evaluate the ability of large language models to automatically assess and explain cognitive appraisals.
arXiv Detail & Related papers (2023-10-22T19:12:17Z) - The world seems different in a social context: a neural network analysis
of human experimental data [57.729312306803955]
We show that it is possible to replicate human behavioral data in both individual and social task settings by modifying the precision of prior and sensory signals.
An analysis of the neural activation traces of the trained networks provides evidence that information is coded in fundamentally different ways in the network in the individual and in the social conditions.
arXiv Detail & Related papers (2022-03-03T17:19:12Z) - AGENT: A Benchmark for Core Psychological Reasoning [60.35621718321559]
Intuitive psychology is the ability to reason about hidden mental variables that drive observable actions.
Despite recent interest in machine agents that reason about other agents, it is not clear if such agents learn or hold the core psychology principles that drive human reasoning.
We present a benchmark consisting of procedurally generated 3D animations, AGENT, structured around four scenarios.
arXiv Detail & Related papers (2021-02-24T14:58:23Z) - Prediction of Human Empathy based on EEG Cortical Asymmetry [0.0]
lateralization of brain oscillations at specific frequency bands is an important predictor of self-reported empathy scores.
Results could be employed in the development of brain-computer interfaces that assist people with difficulties in expressing or recognizing emotions.
arXiv Detail & Related papers (2020-05-06T13:49:56Z) - SensAI+Expanse Emotional Valence Prediction Studies with Cognition and
Memory Integration [0.0]
This work contributes with an artificial intelligent agent able to assist on cognitive science studies.
The developed artificial agent system (SensAI+Expanse) includes machine learning algorithms, empathetic algorithms, and memory.
Results of the present study show evidence of significant emotional behaviour differences between some age ranges and gender combinations.
arXiv Detail & Related papers (2020-01-03T18:17:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.