PersonaTAB: Predicting Personality Traits using Textual, Acoustic, and Behavioral Cues in Fully-Duplex Speech Dialogs
- URL: http://arxiv.org/abs/2505.14356v1
- Date: Tue, 20 May 2025 13:41:32 GMT
- Title: PersonaTAB: Predicting Personality Traits using Textual, Acoustic, and Behavioral Cues in Fully-Duplex Speech Dialogs
- Authors: Sho Inoue, Shai Wang, Haizhou Li,
- Abstract summary: Personality-aware conversation agents are underexplored due to the absence of personality annotations in speech datasets.<n>We propose a pipeline that preprocesses raw audio recordings to create a dialogue dataset annotated with timestamps, response types, and emotion/sentiment labels.<n>We employ an automatic speech recognition (ASR) system to extract transcripts and timestamps, then generate conversation-level annotations.
- Score: 36.18860434920165
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite significant progress in neural spoken dialog systems, personality-aware conversation agents -- capable of adapting behavior based on personalities -- remain underexplored due to the absence of personality annotations in speech datasets. We propose a pipeline that preprocesses raw audio recordings to create a dialogue dataset annotated with timestamps, response types, and emotion/sentiment labels. We employ an automatic speech recognition (ASR) system to extract transcripts and timestamps, then generate conversation-level annotations. Leveraging these annotations, we design a system that employs large language models to predict conversational personality. Human evaluators were engaged to identify conversational characteristics and assign personality labels. Our analysis demonstrates that the proposed system achieves stronger alignment with human judgments compared to existing approaches.
Related papers
- Aligning Spoken Dialogue Models from User Interactions [55.192134724622235]
We propose a novel preference alignment framework to improve spoken dialogue models on realtime conversations from user interactions.<n>We create a dataset of more than 150,000 preference pairs from raw multi-turn speech conversations annotated with AI feedback.<n>Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.
arXiv Detail & Related papers (2025-06-26T16:45:20Z) - Enhancing Impression Change Prediction in Speed Dating Simulations Based on Speakers' Personalities [2.1740370446058708]
This paper focuses on simulating text dialogues in which impressions between speakers improve during speed dating.<n>We believe that whether an utterance improves a dialogue partner's impression of the speaker may depend on the personalities of both parties.<n>We propose a method that predicts whether an utterance improves a partner's impression of the speaker, considering the personalities.
arXiv Detail & Related papers (2025-02-07T07:18:32Z) - WavChat: A Survey of Spoken Dialogue Models [66.82775211793547]
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o, have captured significant attention in the speech domain.
These advanced spoken dialogue models not only comprehend audio, music, and other speech-related features, but also capture stylistic and timbral characteristics in speech.
Despite the progress in spoken dialogue systems, there is a lack of comprehensive surveys that systematically organize and analyze these systems.
arXiv Detail & Related papers (2024-11-15T04:16:45Z) - Affective-NLI: Towards Accurate and Interpretable Personality Recognition in Conversation [30.820334868031537]
Personality Recognition in Conversation (PRC) aims to identify the personality traits of speakers through textual dialogue content.
We propose Affective Natural Language Inference (Affective-NLI) for accurate and interpretable PRC.
arXiv Detail & Related papers (2024-04-03T09:14:24Z) - Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue [71.15186328127409]
Paralinguistics-enhanced Generative Pretrained Transformer (ParalinGPT)
Model takes the conversational context of text, speech embeddings, and paralinguistic attributes as input prompts within a serialized multitasking framework.
We utilize the Switchboard-1 corpus, including its sentiment labels as the paralinguistic attribute, as our spoken dialogue dataset.
arXiv Detail & Related papers (2023-12-23T18:14:56Z) - Psychological Metrics for Dialog System Evaluation [16.16116910201279]
We present five interpretable metrics from established psychology that are fundamental to human communication and relationships.
The psychological metrics are compared against seven state-of-the-art traditional metrics.
arXiv Detail & Related papers (2023-05-24T06:02:32Z) - Affective social anthropomorphic intelligent system [1.7849339006560665]
This research proposes an anthropomorphic intelligent system that can hold a proper human-like conversation with emotion and personality.
A voice style transfer method is also proposed to map the attributes of a specific emotion.
arXiv Detail & Related papers (2023-04-19T18:24:57Z) - deep learning of segment-level feature representation for speech emotion
recognition in conversations [9.432208348863336]
We propose a conversational speech emotion recognition method to deal with capturing attentive contextual dependency and speaker-sensitive interactions.
First, we use a pretrained VGGish model to extract segment-based audio representation in individual utterances.
Second, an attentive bi-directional recurrent unit (GRU) models contextual-sensitive information and explores intra- and inter-speaker dependencies jointly.
arXiv Detail & Related papers (2023-02-05T16:15:46Z) - "How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken
Conversations [87.95711406978157]
This work presents a new benchmark on spoken task-oriented conversations.
We study multi-domain dialogue state tracking and knowledge-grounded dialogue modeling.
Our data set enables speech-based benchmarking of task-oriented dialogue systems.
arXiv Detail & Related papers (2021-09-28T04:51:04Z) - Know Deeper: Knowledge-Conversation Cyclic Utilization Mechanism for
Open-domain Dialogue Generation [11.72386584395626]
End-to-End intelligent neural dialogue systems suffer from the problems of generating inconsistent and repetitive responses.
Existing dialogue models pay attention to unilaterally incorporating personal knowledge into the dialog while ignoring the fact that incorporating the personality-related conversation information into personal knowledge taken as the bilateral information flow boosts the quality of the subsequent conversation.
We propose a conversation-adaption multi-view persona aware response generation model that aims at enhancing conversation consistency and alleviating the repetition from two folds.
arXiv Detail & Related papers (2021-07-16T08:59:06Z) - Dialogue History Matters! Personalized Response Selectionin Multi-turn
Retrieval-based Chatbots [62.295373408415365]
We propose a personalized hybrid matching network (PHMN) for context-response matching.
Our contributions are two-fold: 1) our model extracts personalized wording behaviors from user-specific dialogue history as extra matching information.
We evaluate our model on two large datasets with user identification, i.e., personalized dialogue Corpus Ubuntu (P- Ubuntu) and personalized Weibo dataset (P-Weibo)
arXiv Detail & Related papers (2021-03-17T09:42:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.