An Automated Quality Evaluation Framework of Psychotherapy Conversations
with Local Quality Estimates
- URL: http://arxiv.org/abs/2106.07922v1
- Date: Tue, 15 Jun 2021 07:18:30 GMT
- Title: An Automated Quality Evaluation Framework of Psychotherapy Conversations
with Local Quality Estimates
- Authors: Zhuohao Chen, Nikolaos Flemotomos, Karan Singla, Torrey A. Creed,
David C. Atkins, Shrikanth Narayanan
- Abstract summary: We propose a hierarchical framework to automatically evaluate the quality of a CBT interaction.
We first fine-tune BERT for predicting segment-level (local) quality scores.
We then use segment embeddings as lower-level input to a Bidirectional LSTM-based neural network to predict session-level (global) quality estimates.
- Score: 38.841853815519734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computational approaches for assessing the quality of conversation-based
psychotherapy, such as Cognitive Behavioral Therapy (CBT) and Motivational
Interviewing (MI), have been developed recently to support quality assurance
and clinical training. However, due to the long session lengths and limited
modeling resources, computational methods largely rely on frequency-based
lexical features or distribution of dialogue acts. In this work, we propose a
hierarchical framework to automatically evaluate the quality of a CBT
interaction. We divide each psychotherapy session into conversation segments
and input those into a BERT-based model to produce segment embeddings. We first
fine-tune BERT for predicting segment-level (local) quality scores and then use
segment embeddings as lower-level input to a Bidirectional LSTM-based neural
network to predict session-level (global) quality estimates. In particular, the
segment-level quality scores are initialized with the session-level scores and
we model the global quality as a function of the local quality scores to
achieve the accurate segment-level quality estimates. These estimated
segment-level scores benefit theBERT fine-tuning and in learning better segment
embeddings. We evaluate the proposed framework on data drawn from real-world
CBT clinical session recordings to predict multiple session-level behavior
codes. The results indicate that our approach leads to improved evaluation
accuracy for most codes in both regression and classification tasks.
Related papers
- CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy [67.23830698947637]
We propose a new benchmark, CBT-BENCH, for the systematic evaluation of cognitive behavioral therapy (CBT) assistance.
We include three levels of tasks in CBT-BENCH: I: Basic CBT knowledge acquisition, with the task of multiple-choice questions; II: Cognitive model understanding, with the tasks of cognitive distortion classification, primary core belief classification, and fine-grained core belief classification; III: Therapeutic response generation, with the task of generating responses to patient speech in CBT therapy sessions.
Experimental results indicate that while LLMs perform well in reciting CBT knowledge, they fall short in complex real-world scenarios
arXiv Detail & Related papers (2024-10-17T04:52:57Z) - Exploring Pathological Speech Quality Assessment with ASR-Powered Wav2Vec2 in Data-Scarce Context [7.567181073057191]
This paper introduces a novel approach where the system learns at the audio level instead of segments despite data scarcity.
It shows that the ASR based Wav2Vec2 model brings the best results and may indicate a strong correlation between ASR and speech quality assessment.
arXiv Detail & Related papers (2024-03-29T13:59:34Z) - Hyperparameters in Continual Learning: A Reality Check [53.30082523545212]
Continual learning (CL) aims to train a model on a sequence of tasks while balancing the trade-off between plasticity (learning new tasks) and stability (retaining prior knowledge)
The dominantly adopted conventional evaluation protocol for CL algorithms selects the best hyper parameters in a given scenario and then evaluates the algorithms in the same scenario.
This protocol has significant shortcomings: it overestimates the CL capacity of algorithms and relies on unrealistic hyper parameter tuning.
We argue that the evaluation of CL algorithms should focus on assessing the generalizability of their CL capacity to unseen scenarios.
arXiv Detail & Related papers (2024-03-14T03:13:01Z) - Calibrating LLM-Based Evaluator [92.17397504834825]
We propose AutoCalibrate, a multi-stage, gradient-free approach to calibrate and align an LLM-based evaluator toward human preference.
Instead of explicitly modeling human preferences, we first implicitly encompass them within a set of human labels.
Our experiments on multiple text quality evaluation datasets illustrate a significant improvement in correlation with expert evaluation through calibration.
arXiv Detail & Related papers (2023-09-23T08:46:11Z) - Improving Generalization Capability of Deep Learning-Based Nuclei
Instance Segmentation by Non-deterministic Train Time and Deterministic Test
Time Stain Normalization [0.674572634849505]
nuclei instance segmentation plays a fundamental role in a wide range of clinical and research applications.
Deep learning (DL)-based approaches have been shown to deliver the best performances.
We propose a novel method to improve the generalization capability of a DL-based automatic segmentation approach.
arXiv Detail & Related papers (2023-09-12T11:29:35Z) - Deep Quality Estimation: Creating Surrogate Models for Human Quality
Ratings [6.645279583701951]
We evaluate on a complex multi-class segmentation problem, specifically glioma segmentation following the BraTS annotation protocol.
The training data features quality ratings from 15 expert neuroradiologists on a scale ranging from 1 to 6 stars for various computer-generated and manual 3D annotations.
We can approximate segmentation quality within a margin of error comparable to human intra-rater reliability.
arXiv Detail & Related papers (2022-05-17T10:32:27Z) - NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
Quality [123.97136358092585]
We develop a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.
Specifically, we leverage a variational autoencoder (VAE) for end-to-end text to waveform generation.
Experiment evaluations on popular LJSpeech dataset show that our proposed NaturalSpeech achieves -0.01 CMOS to human recordings at the sentence level.
arXiv Detail & Related papers (2022-05-09T16:57:35Z) - Automated Quality Assessment of Cognitive Behavioral Therapy Sessions
Through Highly Contextualized Language Representations [34.670548892766625]
A BERT-based model is proposed for automatic behavioral scoring of a specific type of psychotherapy, called Cognitive Behavioral Therapy (CBT)
The model is trained in a multi-task manner in order to achieve higher interpretability.
BERT-based representations are further augmented with available therapy metadata, providing relevant non-linguistic context and leading to consistent performance improvements.
arXiv Detail & Related papers (2021-02-23T09:22:29Z) - Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy
Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning.
ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation.
Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z) - Quality-aware semi-supervised learning for CMR segmentation [2.9928692313705505]
One of the challenges in developing deep learning algorithms for medical image segmentation is the scarcity of training data.
We propose a novel scheme that uses QC of the downstream task to identify high quality outputs of CMR segmentation networks.
In essence, this provides quality-aware augmentation of training data in a variant of SSL for segmentation networks.
arXiv Detail & Related papers (2020-09-01T17:18:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.