Related papers: How Real Are Synthetic Therapy Conversations? Evaluating Fidelity in Prolonged Exposure Dialogues

How Real Are Synthetic Therapy Conversations? Evaluating Fidelity in Prolonged Exposure Dialogues

URL: http://arxiv.org/abs/2504.21800v4
Date: Sat, 20 Sep 2025 08:28:38 GMT
Title: How Real Are Synthetic Therapy Conversations? Evaluating Fidelity in Prolonged Exposure Dialogues
Authors: Suhas BN, Dominik Mattioli, Saeed Abdullah, Rosa I. Arriaga, Chris W. Wiese, Andrew M. Sherrill,
Abstract summary: Synthetic data adoption in healthcare is driven by privacy concerns, data access limitations, and high annotation costs.<n>We explore synthetic Prolonged Exposure (PE) therapy conversations for PTSD as a scalable alternative for training clinical models.<n>We systematically compare real and synthetic dialogues using linguistic, structural, and protocol-specific metrics like turn-taking and treatment fidelity.
Score: 14.457387337806765
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Synthetic data adoption in healthcare is driven by privacy concerns, data access limitations, and high annotation costs. We explore synthetic Prolonged Exposure (PE) therapy conversations for PTSD as a scalable alternative for training clinical models. We systematically compare real and synthetic dialogues using linguistic, structural, and protocol-specific metrics like turn-taking and treatment fidelity. We introduce and evaluate PE-specific metrics, offering a novel framework for assessing clinical fidelity beyond surface fluency. Our findings show that while synthetic data successfully mitigates data scarcity and protects privacy, capturing the most subtle therapeutic dynamics remains a complex challenge. Synthetic dialogues successfully replicate key linguistic features of real conversations, for instance, achieving a similar Readability Score (89.2 vs. 88.1), while showing differences in some key fidelity markers like distress monitoring. This comparison highlights the need for fidelity-aware metrics that go beyond surface fluency to identify clinically significant nuances. Our model-agnostic framework is a critical tool for developers and clinicians to benchmark generative model fidelity before deployment in sensitive applications. Our findings help clarify where synthetic data can effectively complement real-world datasets, while also identifying areas for future refinement.

Related papers

SHAP Distance: An Explainability-Aware Metric for Evaluating the Semantic Fidelity of Synthetic Tabular Data [7.227194225143588]
We introduce the SHapley Additive exPlanations (SHAP) Distance, a novel explainability-aware metric that is defined as the cosine distance between the global SHAP attribution vectors.<n>We analyze datasets that span clinical health records with physiological features, enterprise invoice transactions with heterogeneous scales, and telecom churn logs with mixed categorical-numerical attributes.<n>Our results show that the SHAP Distance captures feature importance shifts and underrepresented tail effects that the Kullback-Leibler divergence and Train-on-Synthetic-Test-on-Real accuracy fail to detect.
arXiv Detail & Related papers (2025-11-17T03:47:47Z)
Improving the Generation and Evaluation of Synthetic Data for Downstream Medical Causal Inference [89.5628648718851]
Causal inference is essential for developing and evaluating medical interventions.<n>Real-world medical datasets are often difficult to access due to regulatory barriers.<n>We present STEAM: a novel method for generating Synthetic data for Treatment Effect Analysis in Medicine.
arXiv Detail & Related papers (2025-10-21T16:16:00Z)
Understanding the Influence of Synthetic Data for Text Embedders [52.04771455432998]
We first reproduce and publicly release the synthetic data proposed by Wang et al.<n>We critically examine where exactly synthetic data improves model generalization.<n>Our findings highlight the limitations of current synthetic data approaches for building general-purpose embedders.
arXiv Detail & Related papers (2025-09-07T19:28:52Z)
DualAlign: Generating Clinically Grounded Synthetic Data [9.87164447021602]
We introduce DualAlign, a framework that enhances statistical fidelity and clinical plausibility through dual alignment.<n>Using Alzheimer's disease (AD) as a case study, DualAlign produces context-grounded symptom-level sentences.<n>Fine-tuning an LLaMA 3.1-8B model with a combination of DualAlign-generated and human-annotated data yields substantial performance gains.
arXiv Detail & Related papers (2025-09-05T18:04:38Z)
Why Synthetic Isn't Real Yet: A Diagnostic Framework for Contact Center Dialogue Generation [3.2960068101198963]
Synthetic transcript generation is critical in contact center domains, where privacy and data scarcity limit model training and evaluation.<n>We benchmark four language-agnostic generation strategies, from simple prompting to characteristic-aware multi-stage approaches.<n>Results reveal persistent challenges: no method excels across all traits, with notable deficits in disfluency, sentiment, and behavioral realism.
arXiv Detail & Related papers (2025-08-25T17:10:36Z)
Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology [0.0]
This study explores voice cloning to generate synthetic speech replicating the unique patterns of individuals with dysarthria.<n>Using the TORGO dataset, we address data scarcity and privacy challenges in speech-language pathology.<n>We cloned voices from dysarthric and control speakers using a commercial platform, ensuring gender-matched synthetic voices.
arXiv Detail & Related papers (2025-03-03T07:44:49Z)
Medical Hallucinations in Foundation Models and Their Impact on Healthcare [53.97060824532454]
Foundation Models that are capable of processing and generating multi-modal data have transformed AI's role in medicine.<n>We define medical hallucination as any instance in which a model generates misleading medical content.<n>Our results reveal that inference techniques such as Chain-of-Thought (CoT) and Search Augmented Generation can effectively reduce hallucination rates.<n>These findings underscore the ethical and practical imperative for robust detection and mitigation strategies.
arXiv Detail & Related papers (2025-02-26T02:30:44Z)
Position Paper: Building Trust in Synthetic Data for Clinical AI [0.3937354192623676]
This paper argues that fostering trust in synthetic medical data is crucial for its clinical adoption.<n>We present empirical evidence from brain tumor segmentation to demonstrate that the quality, diversity, and proportion of synthetic data directly impact trust in clinical AI models.
arXiv Detail & Related papers (2025-02-04T07:53:23Z)
Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversations [74.83732294523402]
We introduce a novel benchmark that simulates real-world diagnostic scenarios, integrating noise and difficulty levels aligned with USMLE standards.<n>We also explore dialogue-based fine-tuning, which transforms static datasets into conversational formats to better capture iterative reasoning processes.<n>Experiments show that dialogue-tuned models outperform traditional methods, with improvements of $9.64%$ in multi-round reasoning scenarios and $6.18%$ in accuracy in a noisy environment.
arXiv Detail & Related papers (2025-01-29T18:58:48Z)
LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment [75.44934940580112]
This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment.<n>We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews.<n>Our approach, tested on 236 real-world interviews, demonstrates strong correlations with clinician assessments.
arXiv Detail & Related papers (2025-01-07T08:49:04Z)
SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data [78.70620682374624]
We introduce SynFER, a novel framework for synthesizing facial expression image data based on high-level textual descriptions.<n>To ensure the quality and reliability of the synthetic data, we propose a semantic guidance technique and a pseudo-label generator.<n>Results validate the efficacy of our approach and the synthetic data.
arXiv Detail & Related papers (2024-10-13T14:58:21Z)
Synthetic Patient-Physician Dialogue Generation from Clinical Notes Using LLM [27.33193944412666]
Medical dialogue systems (MDS) enhance patient-physician communication, improve healthcare accessibility, and reduce costs. However, acquiring suitable data to train these systems poses significant challenges. Our approach, SynDial, uses a single LLM iteratively with zero-shot prompting and a feedback loop to generate high-quality synthetic dialogues.
arXiv Detail & Related papers (2024-08-12T16:49:22Z)
Zero-shot and Few-shot Generation Strategies for Artificial Clinical Records [1.338174941551702]
This study assesses the capability of the Llama 2 LLM to create synthetic medical records that accurately reflect real patient information. We focus on generating synthetic narratives for the History of Present Illness section, utilising data from the MIMIC-IV dataset for comparison. Our findings suggest that this chain-of-thought prompted approach allows the zero-shot model to achieve results on par with those of fine-tuned models, based on Rouge metrics evaluation.
arXiv Detail & Related papers (2024-03-13T16:17:09Z)
Detecting the Clinical Features of Difficult-to-Treat Depression using Synthetic Data from Large Language Models [0.20971479389679337]
We seek to develop a Large Language Model (LLM)-based tool capable of interrogating routinely-collected, narrative (free-text) electronic health record data. We use LLM-generated synthetic data (GPT3.5) and a Non-Maximum Suppression (NMS) algorithm to train a BERT-based span extraction model. We show it is possible to obtain good overall performance (0.70 F1 across polarity) on real clinical data on a set of as many as 20 different factors, and high performance (0.85 F1 with 0.95 precision) on a subset of important DTD
arXiv Detail & Related papers (2024-02-12T13:34:33Z)
Reliability in Semantic Segmentation: Can We Use Synthetic Data? [69.28268603137546]
We show for the first time how synthetic data can be specifically generated to assess comprehensively the real-world reliability of semantic segmentation models. This synthetic data is employed to evaluate the robustness of pretrained segmenters. We demonstrate how our approach can be utilized to enhance the calibration and OOD detection capabilities of segmenters.
arXiv Detail & Related papers (2023-12-14T18:56:07Z)
TREEMENT: Interpretable Patient-Trial Matching via Personalized Dynamic Tree-Based Memory Network [54.332862955411656]
Clinical trials are critical for drug development but often suffer from expensive and inefficient patient recruitment. In recent years, machine learning models have been proposed for speeding up patient recruitment via automatically matching patients with clinical trials. We introduce a dynamic tree-based memory network model named TREEMENT to provide accurate and interpretable patient trial matching.
arXiv Detail & Related papers (2023-07-19T12:35:09Z)
Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging. We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets. We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.