Related papers: KokoroChat: A Japanese Psychological Counseling Dialogue Dataset Collected via Role-Playing by Trained Counselors

KokoroChat: A Japanese Psychological Counseling Dialogue Dataset Collected via Role-Playing by Trained Counselors

URL: http://arxiv.org/abs/2506.01357v1
Date: Mon, 02 Jun 2025 06:20:53 GMT
Title: KokoroChat: A Japanese Psychological Counseling Dialogue Dataset Collected via Role-Playing by Trained Counselors
Authors: Zhiyang Qi, Takumasa Kaneko, Keiko Takamizo, Mariko Ukiyo, Michimasa Inaba,
Abstract summary: This study adopts a role-playing approach where trained counselors simulate counselor-client interactions.<n>We construct KokoroChat, a Japanese psychological counseling dialogue dataset comprising 6,589 long-form dialogues.<n> Experimental results demonstrate that fine-tuning open-source LLMs with KokoroChat improves both the quality of generated counseling responses and the automatic evaluation of counseling dialogues.
Score: 1.3456699275044242
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Generating psychological counseling responses with language models relies heavily on high-quality datasets. Crowdsourced data collection methods require strict worker training, and data from real-world counseling environments may raise privacy and ethical concerns. While recent studies have explored using large language models (LLMs) to augment psychological counseling dialogue datasets, the resulting data often suffers from limited diversity and authenticity. To address these limitations, this study adopts a role-playing approach where trained counselors simulate counselor-client interactions, ensuring high-quality dialogues while mitigating privacy risks. Using this method, we construct KokoroChat, a Japanese psychological counseling dialogue dataset comprising 6,589 long-form dialogues, each accompanied by comprehensive client feedback. Experimental results demonstrate that fine-tuning open-source LLMs with KokoroChat improves both the quality of generated counseling responses and the automatic evaluation of counseling dialogues. The KokoroChat dataset is available at https://github.com/UEC-InabaLab/KokoroChat.

Related papers

Psychological Counseling Cannot Be Achieved Overnight: Automated Psychological Counseling Through Multi-Session Conversations [26.422675063457827]
We introduce a dataset for Multi-Session Psychological Counseling Conversation dataset (MusPsy-Dataset)<n>Our MusPsy-Dataset is constructed using real client profiles from publicly available psychological case reports.<n>We also developed our MusPsy-Model, which aims to track client progress and adapt its counseling direction over time.
arXiv Detail & Related papers (2025-06-07T02:00:45Z)
Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations [58.65755268815283]
Many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion. We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations. Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.
arXiv Detail & Related papers (2024-11-07T21:37:51Z)
Interactive Agents: Simulating Counselor-Client Psychological Counseling via Role-Playing LLM-to-LLM Interactions [12.455050661682051]
We propose a framework that employs two large language models (LLMs) via role-playing for simulating counselor-client interactions. Our framework involves two LLMs, one acting as a client equipped with a specific and real-life user profile and the other playing the role of an experienced counselor.
arXiv Detail & Related papers (2024-08-28T13:29:59Z)
Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory [24.937025825501998]
We create a multi-turn dialogue dataset that emulates real-life interactions using the goal-oriented and structured approach of Cognitive Behavioral Therapy (CBT) We benchmark against established psychological criteria used to evaluate real counseling sessions, ensuring alignment with expert evaluations. Experimental results demonstrate that Camel, a model trained with Cactus, outperforms other models in counseling skills, highlighting its effectiveness and potential as a counseling agent.
arXiv Detail & Related papers (2024-07-03T13:41:31Z)
Data Augmentation of Multi-turn Psychological Dialogue via Knowledge-driven Progressive Thought Prompting [46.919537239016734]
Large language models (LLMs) have simplified the implementation of multi-turn dialogues. It remains challenging to deliver satisfactory performance in low-resource domain, like psychological dialogue dialogue. We propose a knowledge-driven progressive thought prompting method to guide LLM to generate psychology-related dialogue.
arXiv Detail & Related papers (2024-06-24T12:02:56Z)
CPsyCoun: A Report-based Multi-turn Dialogue Reconstruction and Evaluation Framework for Chinese Psychological Counseling [27.193022503592342]
We propose CPsyCoun, a report-based multi-turn dialogue reconstruction and evaluation framework for Chinese psychological counseling. To fully exploit psychological counseling reports, a two-phase approach is devised to construct high-quality dialogues. A comprehensive evaluation benchmark is developed for the effective automatic evaluation of multi-turn psychological consultations.
arXiv Detail & Related papers (2024-05-26T05:18:00Z)
Response Generation for Cognitive Behavioral Therapy with Large Language Models: Comparative Study with Socratic Questioning [6.400704401007114]
This study investigates the impact of generated responses on subjective evaluations such as mood change, cognitive change, and dialogue quality. When using GPT-4, the amount of mood change, empathy, and other dialogue qualities improve significantly.
arXiv Detail & Related papers (2024-01-29T08:53:41Z)
Does Collaborative Human-LM Dialogue Generation Help Information Extraction from Human Dialogues? [55.28340832822234]
Problem-solving human dialogues in real applications can be much more complex than existing Wizard-of-Oz collections. We introduce a human-in-the-loop dialogue generation framework capable of synthesizing realistic dialogues.
arXiv Detail & Related papers (2023-07-13T20:02:50Z)
Improving Conversational Recommendation Systems via Counterfactual Data Simulation [73.4526400381668]
Conversational recommender systems (CRSs) aim to provide recommendation services via natural language conversations. Existing CRS approaches often suffer from the issue of insufficient training due to the scarcity of training data. We propose a CounterFactual data simulation approach for CRS, named CFCRS, to alleviate the issue of data scarcity in CRSs.
arXiv Detail & Related papers (2023-06-05T12:48:56Z)
Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models [115.7508325840751]
The recent success of large language models (LLMs) has shown great potential to develop more powerful conversational recommender systems (CRSs) In this paper, we embark on an investigation into the utilization of ChatGPT for conversational recommendation, revealing the inadequacy of the existing evaluation protocol. We propose an interactive Evaluation approach based on LLMs named iEvaLM that harnesses LLM-based user simulators.
arXiv Detail & Related papers (2023-05-22T15:12:43Z)
On the Generation of Medical Dialogues for COVID-19 [60.63485429268256]
People experiencing COVID19-related symptoms or exposed to risk factors have a pressing need to consult doctors. Because of the shortage of medical professionals, many people cannot receive online consultations timely. We aim to develop a medical dialogue system that can provide COVID19-related consultations.
arXiv Detail & Related papers (2020-05-11T21:23:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.