Exploring ChatGPT's Capabilities, Stability, Potential and Risks in Conducting Psychological Counseling through Simulations in School Counseling
- URL: http://arxiv.org/abs/2511.01788v1
- Date: Mon, 03 Nov 2025 17:39:57 GMT
- Title: Exploring ChatGPT's Capabilities, Stability, Potential and Risks in Conducting Psychological Counseling through Simulations in School Counseling
- Authors: Yang Ni, Yanzhuo Cao,
- Abstract summary: This study examined ChatGPT's capabilities, including response stability in conducting psychological counseling.<n>We prompted ChatGPT-4 with 80 real-world college-student counseling questions.<n>ChatGPT-4 achieved high warmth (97.5%), empathy (94.2%), and positive acceptance.
- Score: 3.9321601638434465
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: To provide an exploratory analysis of ChatGPT-4's quantitative performance indicators in simulated school-counseling settings. Conversational artificial intelligence (AI) has shown strong capabilities in providing low-cost and timely interventions for a wide range of people and increasing well-being. Therefore, this study examined ChatGPT's capabilities, including response stability in conducting psychological counseling and its potential for providing accessible psychological interventions, especially in school settings. We prompted ChatGPT-4 with 80 real-world college-student counseling questions. Replies were quantified with APA-informed NLP tools to measure warmth, empathy, and acceptance, and run-to-run stability was assessed via Fleiss' \k{appa} and ICC(2,1). ChatGPT-4 achieved high warmth (97.5%), empathy (94.2%), and positive acceptance (mean compound score = 0.93 plus/minus 0.19), with moderate stability (ICC(2,1) = 0.62; \k{appa} = 0.59). Occasional randomness in responses highlights risk areas requiring human oversight. As an offline, single-model text simulation without clinical validation, these results remain exploratory. Future work should involve live users, compare multiple LLMs, and incorporate mixed-methods validation to assess real-world efficacy and safety. The findings suggest ChatGPT-4 could augment low-intensity mental-health support in educational settings, guiding the design of human-in-the-loop workflows, policy regulations, and product roadmaps. This is among the first exploratory studies to apply quantitative stability metrics and NLP-based emotion detection to ChatGPT-4 in a school-counseling context and to integrate a practitioner's perspective to inform future research, product development, and policy.
Related papers
- Responsible Evaluation of AI for Mental Health [72.85175110624736]
Current approaches to evaluating AI tools in mental health care are fragmented and poorly aligned with clinical practice, social context, and first-hand user experience.<n>This paper argues for a rethinking of responsible evaluation by introducing an interdisciplinary framework that integrates clinical soundness, social context, and equity.
arXiv Detail & Related papers (2026-01-20T12:55:10Z) - Generating Natural-Language Surgical Feedback: From Structured Representation to Domain-Grounded Evaluation [66.7752700084159]
High-quality feedback from a surgical trainer is pivotal for improving trainee performance and long-term skill acquisition.<n>We present a structure-aware pipeline that learns a surgical action ontology from real trainer-to-trainee transcripts.
arXiv Detail & Related papers (2025-11-19T06:19:34Z) - Context-Emotion Aware Therapeutic Dialogue Generation: A Multi-component Reinforcement Learning Approach to Language Models for Mental Health Support [3.857814030650221]
Mental health illness represents a substantial global socioeconomic burden.<n>This paper investigated the application of supervised fine-tuning and reinforcement learning techniques to enhance GPT-2's capacity for therapeutic dialogue generation.
arXiv Detail & Related papers (2025-11-14T21:32:10Z) - ChatThero: An LLM-Supported Chatbot for Behavior Change and Therapeutic Support in Addiction Recovery [13.866051319588465]
Substance use disorders (SUDs) affect millions of people, and relapses are common.<n>Access to care is limited, which contributes to the challenge of recovery support.<n>We present textbfChatThero, an innovative low-cost, multi-session, stressor-aware, and memory-persistent autonomous emphlanguage agent
arXiv Detail & Related papers (2025-08-28T16:57:33Z) - Mentalic Net: Development of RAG-based Conversational AI and Evaluation Framework for Mental Health Support [0.0]
Mentalic Net Conversational AI has a BERT Score of 0.898, with other evaluation metrics falling within satisfactory ranges.<n>We advocate for a human-in-the-loop approach and a long-term, responsible strategy in developing such transformative technologies.
arXiv Detail & Related papers (2025-08-27T03:44:56Z) - Reframe Your Life Story: Interactive Narrative Therapist and Innovative Moment Assessment with Large Language Models [72.36715571932696]
Narrative therapy helps individuals transform problematic life stories into empowering alternatives.<n>Current approaches lack realism in specialized psychotherapy and fail to capture therapeutic progression over time.<n>Int (Interactive Narrative Therapist) simulates expert narrative therapists by planning therapeutic stages, guiding reflection levels, and generating contextually appropriate expert-like responses.
arXiv Detail & Related papers (2025-07-27T11:52:09Z) - A Risk Ontology for Evaluating AI-Powered Psychotherapy Virtual Agents [13.721977133773192]
Large Language Models (LLMs) and Intelligent Virtual Agents acting as psychotherapists present opportunities for expanding mental healthcare access.<n>Their deployment has also been linked to serious adverse outcomes, including user harm and suicide.<n>We introduce a novel risk ontology specifically designed for the systematic evaluation of conversational AI psychotherapists.
arXiv Detail & Related papers (2025-05-21T05:01:39Z) - Ψ-Arena: Interactive Assessment and Optimization of LLM-based Psychological Counselors with Tripartite Feedback [51.26493826461026]
We propose Psi-Arena, an interactive framework for comprehensive assessment and optimization of large language models (LLMs)<n>Arena features realistic arena interactions that simulate real-world counseling through multi-stage dialogues with psychologically profiled NPC clients.<n>Experiments across eight state-of-the-art LLMs show significant performance variations in different real-world scenarios and evaluation perspectives.
arXiv Detail & Related papers (2025-05-06T08:22:51Z) - Evaluating the Application of ChatGPT in Outpatient Triage Guidance: A Comparative Study [11.37622565068147]
The integration of Artificial Intelligence in healthcare presents a transformative potential for enhancing operational efficiency and health outcomes.
Large Language Models (LLMs), such as ChatGPT, have shown their capabilities in supporting medical decision-making.
This study specifically aims to evaluate the consistency of responses provided by ChatGPT in outpatient guidance.
arXiv Detail & Related papers (2024-04-27T04:12:02Z) - Analyzing Participants' Engagement during Online Meetings Using Unsupervised Remote Photoplethysmography with Behavioral Features [50.82725748981231]
Engagement measurement finds application in healthcare, education, services.
Use of physiological and behavioral features is viable, but impracticality of traditional physiological measurement arises due to the need for contact sensors.
We demonstrate the feasibility of the unsupervised photoplethysmography (rmography) as an alternative for contact sensors.
arXiv Detail & Related papers (2024-04-05T20:39:16Z) - PsychoGAT: A Novel Psychological Measurement Paradigm through Interactive Fiction Games with LLM Agents [68.50571379012621]
Psychological measurement is essential for mental health, self-understanding, and personal development.
PsychoGAT (Psychological Game AgenTs) achieves statistically significant excellence in psychometric metrics such as reliability, convergent validity, and discriminant validity.
arXiv Detail & Related papers (2024-02-19T18:00:30Z) - Can generative AI and ChatGPT outperform humans on cognitive-demanding
problem-solving tasks in science? [1.1172147007388977]
This study compared the performance of ChatGPT and GPT-4 on 2019 NAEP science assessments with students by cognitive demands of the items.
Results showed that both ChatGPT and GPT-4 consistently outperformed most students who answered the NAEP science assessments.
arXiv Detail & Related papers (2024-01-07T12:36:31Z) - MET: Multimodal Perception of Engagement for Telehealth [52.54282887530756]
We present MET, a learning-based algorithm for perceiving a human's level of engagement from videos.
We release a new dataset, MEDICA, for mental health patient engagement detection.
arXiv Detail & Related papers (2020-11-17T15:18:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.