Related papers: An Automated Quality Evaluation Framework of Psychotherapy Conversations with Local Quality Estimates

An Automated Quality Evaluation Framework of Psychotherapy Conversations with Local Quality Estimates

URL: http://arxiv.org/abs/2106.07922v1
Date: Tue, 15 Jun 2021 07:18:30 GMT
Title: An Automated Quality Evaluation Framework of Psychotherapy Conversations with Local Quality Estimates
Authors: Zhuohao Chen, Nikolaos Flemotomos, Karan Singla, Torrey A. Creed, David C. Atkins, Shrikanth Narayanan
Abstract summary: We propose a hierarchical framework to automatically evaluate the quality of a CBT interaction. We first fine-tune BERT for predicting segment-level (local) quality scores. We then use segment embeddings as lower-level input to a Bidirectional LSTM-based neural network to predict session-level (global) quality estimates.
Score: 38.841853815519734
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Computational approaches for assessing the quality of conversation-based psychotherapy, such as Cognitive Behavioral Therapy (CBT) and Motivational Interviewing (MI), have been developed recently to support quality assurance and clinical training. However, due to the long session lengths and limited modeling resources, computational methods largely rely on frequency-based lexical features or distribution of dialogue acts. In this work, we propose a hierarchical framework to automatically evaluate the quality of a CBT interaction. We divide each psychotherapy session into conversation segments and input those into a BERT-based model to produce segment embeddings. We first fine-tune BERT for predicting segment-level (local) quality scores and then use segment embeddings as lower-level input to a Bidirectional LSTM-based neural network to predict session-level (global) quality estimates. In particular, the segment-level quality scores are initialized with the session-level scores and we model the global quality as a function of the local quality scores to achieve the accurate segment-level quality estimates. These estimated segment-level scores benefit theBERT fine-tuning and in learning better segment embeddings. We evaluate the proposed framework on data drawn from real-world CBT clinical session recordings to predict multiple session-level behavior codes. The results indicate that our approach leads to improved evaluation accuracy for most codes in both regression and classification tasks.

Related papers

QCResUNet: Joint Subject-level and Voxel-level Segmentation Quality Prediction [0.2895421284478621]
Deep learning has made significant strides in automated brain tumor segmentation from magnetic resonance imaging (MRI) scans. There is a need for quality control (QC) to screen the quality of the segmentation results. We propose QCResUNet, which produces subject-level segmentation-quality measures and voxel-level segmentation error maps for each available tissue class.
arXiv Detail & Related papers (2024-12-10T03:27:33Z)
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy [67.23830698947637]
We propose a new benchmark, CBT-BENCH, for the systematic evaluation of cognitive behavioral therapy (CBT) assistance. We include three levels of tasks in CBT-BENCH: I: Basic CBT knowledge acquisition, with the task of multiple-choice questions; II: Cognitive model understanding, with the tasks of cognitive distortion classification, primary core belief classification, and fine-grained core belief classification; III: Therapeutic response generation, with the task of generating responses to patient speech in CBT therapy sessions. Experimental results indicate that while LLMs perform well in reciting CBT knowledge, they fall short in complex real-world scenarios
arXiv Detail & Related papers (2024-10-17T04:52:57Z)
Exploring Pathological Speech Quality Assessment with ASR-Powered Wav2Vec2 in Data-Scarce Context [7.567181073057191]
This paper introduces a novel approach where the system learns at the audio level instead of segments despite data scarcity. It shows that the ASR based Wav2Vec2 model brings the best results and may indicate a strong correlation between ASR and speech quality assessment.
arXiv Detail & Related papers (2024-03-29T13:59:34Z)
Hyperparameters in Continual Learning: A Reality Check [53.30082523545212]
Continual learning (CL) aims to train a model on a sequence of tasks while balancing the trade-off between plasticity (learning new tasks) and stability (retaining prior knowledge) The dominantly adopted conventional evaluation protocol for CL algorithms selects the best hyper parameters in a given scenario and then evaluates the algorithms in the same scenario. This protocol has significant shortcomings: it overestimates the CL capacity of algorithms and relies on unrealistic hyper parameter tuning. We argue that the evaluation of CL algorithms should focus on assessing the generalizability of their CL capacity to unseen scenarios.
arXiv Detail & Related papers (2024-03-14T03:13:01Z)
Calibrating LLM-Based Evaluator [92.17397504834825]
We propose AutoCalibrate, a multi-stage, gradient-free approach to calibrate and align an LLM-based evaluator toward human preference. Instead of explicitly modeling human preferences, we first implicitly encompass them within a set of human labels. Our experiments on multiple text quality evaluation datasets illustrate a significant improvement in correlation with expert evaluation through calibration.
arXiv Detail & Related papers (2023-09-23T08:46:11Z)
Improving Generalization Capability of Deep Learning-Based Nuclei Instance Segmentation by Non-deterministic Train Time and Deterministic Test Time Stain Normalization [0.674572634849505]
nuclei instance segmentation plays a fundamental role in a wide range of clinical and research applications. Deep learning (DL)-based approaches have been shown to deliver the best performances. We propose a novel method to improve the generalization capability of a DL-based automatic segmentation approach.
arXiv Detail & Related papers (2023-09-12T11:29:35Z)
Learning and Evaluating Human Preferences for Conversational Head Generation [101.89332968344102]
We propose a novel learning-based evaluation metric named Preference Score (PS) for fitting human preference according to the quantitative evaluations across different dimensions. PS can serve as a quantitative evaluation without the need for human annotation.
arXiv Detail & Related papers (2023-07-20T07:04:16Z)
Deep Quality Estimation: Creating Surrogate Models for Human Quality Ratings [6.645279583701951]
We evaluate on a complex multi-class segmentation problem, specifically glioma segmentation following the BraTS annotation protocol. The training data features quality ratings from 15 expert neuroradiologists on a scale ranging from 1 to 6 stars for various computer-generated and manual 3D annotations. We can approximate segmentation quality within a margin of error comparable to human intra-rater reliability.
arXiv Detail & Related papers (2022-05-17T10:32:27Z)
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality [123.97136358092585]
We develop a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset. Specifically, we leverage a variational autoencoder (VAE) for end-to-end text to waveform generation. Experiment evaluations on popular LJSpeech dataset show that our proposed NaturalSpeech achieves -0.01 CMOS to human recordings at the sentence level.
arXiv Detail & Related papers (2022-05-09T16:57:35Z)
Automated Quality Assessment of Cognitive Behavioral Therapy Sessions Through Highly Contextualized Language Representations [34.670548892766625]
A BERT-based model is proposed for automatic behavioral scoring of a specific type of psychotherapy, called Cognitive Behavioral Therapy (CBT) The model is trained in a multi-task manner in order to achieve higher interpretability. BERT-based representations are further augmented with available therapy metadata, providing relevant non-linguistic context and leading to consistent performance improvements.
arXiv Detail & Related papers (2021-02-23T09:22:29Z)
Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach [84.02388020258141]
We propose a new framework named ENIGMA for estimating human evaluation scores based on off-policy evaluation in reinforcement learning. ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation. Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
arXiv Detail & Related papers (2021-02-20T03:29:20Z)
Quality-aware semi-supervised learning for CMR segmentation [2.9928692313705505]
One of the challenges in developing deep learning algorithms for medical image segmentation is the scarcity of training data. We propose a novel scheme that uses QC of the downstream task to identify high quality outputs of CMR segmentation networks. In essence, this provides quality-aware augmentation of training data in a variant of SSL for segmentation networks.
arXiv Detail & Related papers (2020-09-01T17:18:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.