M3TCM: Multi-modal Multi-task Context Model for Utterance Classification in Motivational Interviews
- URL: http://arxiv.org/abs/2404.03312v1
- Date: Thu, 4 Apr 2024 09:17:22 GMT
- Title: M3TCM: Multi-modal Multi-task Context Model for Utterance Classification in Motivational Interviews
- Authors: Sayed Muddashir Hossain, Jan Alexandersson, Philipp Müller,
- Abstract summary: We present M3TCM, a Multi-modal, Multi-task Context Model for utterance classification.
Our approach for the first time employs multi-task learning to effectively model both joint and individual components of therapist and client behaviour.
With our novel approach, we outperform the state of the art for utterance classification on the recently introduced AnnoMI dataset with a relative improvement of 20% for the client- and by 15% for therapist utterance classification.
- Score: 1.8100046713740954
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate utterance classification in motivational interviews is crucial to automatically understand the quality and dynamics of client-therapist interaction, and it can serve as a key input for systems mediating such interactions. Motivational interviews exhibit three important characteristics. First, there are two distinct roles, namely client and therapist. Second, they are often highly emotionally charged, which can be expressed both in text and in prosody. Finally, context is of central importance to classify any given utterance. Previous works did not adequately incorporate all of these characteristics into utterance classification approaches for mental health dialogues. In contrast, we present M3TCM, a Multi-modal, Multi-task Context Model for utterance classification. Our approach for the first time employs multi-task learning to effectively model both joint and individual components of therapist and client behaviour. Furthermore, M3TCM integrates information from the text and speech modality as well as the conversation context. With our novel approach, we outperform the state of the art for utterance classification on the recently introduced AnnoMI dataset with a relative improvement of 20% for the client- and by 15% for therapist utterance classification. In extensive ablation studies, we quantify the improvement resulting from each contribution.
Related papers
- Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue [71.15186328127409]
Paralinguistics-enhanced Generative Pretrained Transformer (ParalinGPT)
Model takes the conversational context of text, speech embeddings, and paralinguistic attributes as input prompts within a serialized multitasking framework.
We utilize the Switchboard-1 corpus, including its sentiment labels as the paralinguistic attribute, as our spoken dialogue dataset.
arXiv Detail & Related papers (2023-12-23T18:14:56Z) - Seeing and hearing what has not been said; A multimodal client behavior
classifier in Motivational Interviewing with interpretable fusion [0.8192907805418583]
Motivational Interviewing (MI) is an approach to therapy that emphasizes collaboration and encourages behavioral change.
To evaluate the quality of an MI conversation, client utterances can be classified using the MISC code as either change talk, sustain talk, or follow/neutral talk.
The proportion of change talk in a MI conversation is positively correlated with therapy outcomes, making accurate classification of client utterances essential.
arXiv Detail & Related papers (2023-09-25T16:00:06Z) - MPCHAT: Towards Multimodal Persona-Grounded Conversation [54.800425322314105]
We extend persona-based dialogue to the multimodal domain and make two main contributions.
First, we present the first multimodal persona-based dialogue dataset named MPCHAT.
Second, we empirically show that incorporating multimodal persona, as measured by three proposed multimodal persona-grounded dialogue tasks, leads to statistically significant performance improvements.
arXiv Detail & Related papers (2023-05-27T06:46:42Z) - Mixtures of Deep Neural Experts for Automated Speech Scoring [11.860560781894458]
The paper copes with the task of automatic assessment of second language proficiency from the language learners' spoken responses to test prompts.
The approach relies on two separate modules: (1) an automatic speech recognition system that yields text transcripts of the spoken interactions involved, and (2) a multiple classifier system based on deep learners that ranks the transcripts into proficiency classes.
arXiv Detail & Related papers (2021-06-23T15:44:50Z) - MPC-BERT: A Pre-Trained Language Model for Multi-Party Conversation
Understanding [58.95156916558384]
We present MPC-BERT, a pre-trained model for MPC understanding.
We evaluate MPC-BERT on three downstream tasks including addressee recognition, speaker identification and response selection.
arXiv Detail & Related papers (2021-06-03T01:49:12Z) - Dialogue History Matters! Personalized Response Selectionin Multi-turn
Retrieval-based Chatbots [62.295373408415365]
We propose a personalized hybrid matching network (PHMN) for context-response matching.
Our contributions are two-fold: 1) our model extracts personalized wording behaviors from user-specific dialogue history as extra matching information.
We evaluate our model on two large datasets with user identification, i.e., personalized dialogue Corpus Ubuntu (P- Ubuntu) and personalized Weibo dataset (P-Weibo)
arXiv Detail & Related papers (2021-03-17T09:42:11Z) - Multitask Learning for Emotion and Personality Detection [17.029426018676997]
We build on the known correlation between personality traits and emotional behaviors, and propose a novel multitask learning framework, SoGMTL.
Our more computationally efficient CNN-based multitask model achieves the state-of-the-art performance across multiple famous personality and emotion datasets.
arXiv Detail & Related papers (2021-01-07T03:09:55Z) - Re-framing Incremental Deep Language Models for Dialogue Processing with
Multi-task Learning [14.239355474794142]
We present a multi-task learning framework to enable the training of one universal incremental dialogue processing model.
We show that these tasks provide positive inductive biases to each other with the optimal contribution of each one relying on the severity of the noise from the task.
arXiv Detail & Related papers (2020-11-13T04:31:51Z) - Topic-Aware Multi-turn Dialogue Modeling [91.52820664879432]
This paper presents a novel solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way.
Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network.
arXiv Detail & Related papers (2020-09-26T08:43:06Z) - Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue
Representation Learning [50.5572111079898]
Multi-role dialogue understanding comprises a wide range of diverse tasks such as question answering, act classification, dialogue summarization etc.
While dialogue corpora are abundantly available, labeled data, for specific learning tasks, can be highly scarce and expensive.
In this work, we investigate dialogue context representation learning with various types unsupervised pretraining tasks.
arXiv Detail & Related papers (2020-02-27T04:36:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.