Related papers: Advancing an Interdisciplinary Science of Conversation: Insights from a Large Multimodal Corpus of Human Speech

Advancing an Interdisciplinary Science of Conversation: Insights from a Large Multimodal Corpus of Human Speech

URL: http://arxiv.org/abs/2203.00674v1
Date: Tue, 1 Mar 2022 18:50:33 GMT
Title: Advancing an Interdisciplinary Science of Conversation: Insights from a Large Multimodal Corpus of Human Speech
Authors: Andrew Reece, Gus Cooney, Peter Bull, Christine Chung, Bryn Dawson, Casey Fitzpatrick, Tamara Glazer, Dean Knox, Alex Liebscher and Sebastian Marin
Abstract summary: In this report we advance an interdisciplinary science of conversation, with findings from a large, multimodal corpus of 1,656 recorded conversations in spoken English. This 7+ million word, 850 hour corpus totals over 1TB of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression. We report (5) a comprehensive mixed-method report, based on quantitative analysis and qualitative review of each recording, that showcases how individuals from diverse backgrounds alter their communication patterns and find ways to connect.
Score: 0.12038936091716987
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: People spend a substantial portion of their lives engaged in conversation, and yet our scientific understanding of conversation is still in its infancy. In this report we advance an interdisciplinary science of conversation, with findings from a large, novel, multimodal corpus of 1,656 recorded conversations in spoken English. This 7+ million word, 850 hour corpus totals over 1TB of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression, along with an extensive survey of speaker post conversation reflections. We leverage the considerable scope of the corpus to (1) extend key findings from the literature, such as the cooperativeness of human turn-taking; (2) define novel algorithmic procedures for the segmentation of speech into conversational turns; (3) apply machine learning insights across various textual, auditory, and visual features to analyze what makes conversations succeed or fail; and (4) explore how conversations are related to well-being across the lifespan. We also report (5) a comprehensive mixed-method report, based on quantitative analysis and qualitative review of each recording, that showcases how individuals from diverse backgrounds alter their communication patterns and find ways to connect. We conclude with a discussion of how this large-scale public dataset may offer new directions for future research, especially across disciplinary boundaries, as scholars from a variety of fields appear increasingly interested in the study of conversation.

Related papers

TAVID: Text-Driven Audio-Visual Interactive Dialogue Generation [72.46711449668814]
We introduce TAVID, a unified framework that generates both interactive faces and conversational speech in a synchronized manner.<n>We evaluate our system across four dimensions: talking face realism, listening head responsiveness, dyadic interaction, and speech quality.
arXiv Detail & Related papers (2025-12-23T12:04:23Z)
A Cocktail-Party Benchmark: Multi-Modal dataset and Comparative Evaluation Results [62.01871490859886]
We introduce the task of Multi-Modal Context-Aware Recognition (MCoRec) in the ninth CHiME Challenge.<n>MCoRec captures natural multi-party conversations where the recordings focus on unscripted, casual group chats.<n>The task requires systems to answer the question "Who speaks when, what, and with whom?" by jointly transcribing each speaker's speech and clustering them into their respective conversations from audio-visual recordings.
arXiv Detail & Related papers (2025-10-27T12:36:43Z)
Detecting In-Person Conversations in Noisy Real-World Environments with Smartwatch Audio and Motion Sensing [1.5999407512883512]
Social interactions play a crucial role in shaping human behavior, relationships, and societies.<n>We develop a novel computational approach to detect a foundational aspect of human social interactions, in-person verbal conversations.
arXiv Detail & Related papers (2025-07-16T07:57:15Z)
Aligning Spoken Dialogue Models from User Interactions [55.192134724622235]
We propose a novel preference alignment framework to improve spoken dialogue models on realtime conversations from user interactions.<n>We create a dataset of more than 150,000 preference pairs from raw multi-turn speech conversations annotated with AI feedback.<n>Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.
arXiv Detail & Related papers (2025-06-26T16:45:20Z)
Multimodal Conversation Structure Understanding [12.29827265137757]
Large language models' ability to understand fine-grained conversational structure remains underexplored.<n>We present a human annotated dataset of 4,398 annotations for speakers and reply-to relationship, 5,755 addressees, and 3,142 side-participants.<n>We evaluate popular audio-visual LLMs and vision-language models on our dataset, and our experimental results suggest that multimodal conversational structure understanding remains challenging.
arXiv Detail & Related papers (2025-05-23T06:41:54Z)
WavChat: A Survey of Spoken Dialogue Models [66.82775211793547]
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o, have captured significant attention in the speech domain. These advanced spoken dialogue models not only comprehend audio, music, and other speech-related features, but also capture stylistic and timbral characteristics in speech. Despite the progress in spoken dialogue systems, there is a lack of comprehensive surveys that systematically organize and analyze these systems.
arXiv Detail & Related papers (2024-11-15T04:16:45Z)
Conversation Chronicles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Conversations [9.249662593315541]
We introduce a new 1M multi-session dialogue dataset, Conversation Chronicles, for implementing a long-term conversation setup. We show that dialogue episodes in Conversation Chronicles reflect those properties while maintaining coherent and consistent interactions. We also propose a dialogue model, called ReBot, which consists of chronological summarization and dialogue generation modules.
arXiv Detail & Related papers (2023-10-20T11:06:21Z)
Multi-turn Dialogue Comprehension from a Topic-aware Perspective [70.37126956655985]
This paper proposes to model multi-turn dialogues from a topic-aware perspective. We use a dialogue segmentation algorithm to split a dialogue passage into topic-concentrated fragments in an unsupervised way. We also present a novel model, Topic-Aware Dual-Attention Matching (TADAM) Network, which takes topic segments as processing elements.
arXiv Detail & Related papers (2023-09-18T11:03:55Z)
NewsDialogues: Towards Proactive News Grounded Conversation [72.10055780635625]
We propose a novel task, Proactive News Grounded Conversation, in which a dialogue system can proactively lead the conversation based on some key topics of the news. To further develop this novel task, we collect a human-to-human Chinese dialogue dataset tsNewsDialogues, which includes 1K conversations with a total of 14.6K utterances.
arXiv Detail & Related papers (2023-08-12T08:33:42Z)
End-to-end Spoken Conversational Question Answering: Task, Dataset and Model [92.18621726802726]
In spoken question answering, the systems are designed to answer questions from contiguous text spans within the related speech transcripts. We propose a new Spoken Conversational Question Answering task (SCQA), aiming at enabling the systems to model complex dialogue flows. Our main objective is to build the system to deal with conversational questions based on the audio recordings, and to explore the plausibility of providing more cues from different modalities with systems in information gathering.
arXiv Detail & Related papers (2022-04-29T17:56:59Z)
Who says like a style of Vitamin: Towards Syntax-Aware DialogueSummarization using Multi-task Learning [2.251583286448503]
We focus on the association between utterances from individual speakers and unique syntactic structures. Speakers have unique textual styles that can contain linguistic information, such as voiceprint. We employ multi-task learning of both syntax-aware information and dialogue summarization.
arXiv Detail & Related papers (2021-09-29T05:30:39Z)
Advances in Multi-turn Dialogue Comprehension: A Survey [51.215629336320305]
We review the previous methods from the perspective of dialogue modeling. We discuss three typical patterns of dialogue modeling that are widely-used in dialogue comprehension tasks.
arXiv Detail & Related papers (2021-03-04T15:50:17Z)
MultiTalk: A Highly-Branching Dialog Testbed for Diverse Conversations [39.81965687032923]
We present the MultiTalk dataset, a corpus of over 320,000 sentences of written conversational dialog. We make multiple contributions to study dialog generation in the highly branching setting. Our culminating task is a challenging theory of mind problem, a controllable generation task.
arXiv Detail & Related papers (2021-02-02T02:29:40Z)
KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation [66.99734491847076]
We propose a Chinese multi-domain knowledge-driven conversation dataset, KdConv, which grounds the topics in multi-turn conversations to knowledge graphs. Our corpus contains 4.5K conversations from three domains (film, music, and travel), and 86K utterances with an average turn number of 19.0.
arXiv Detail & Related papers (2020-04-08T16:25:39Z)
Detecting depression in dyadic conversations with multimodal narratives and visualizations [1.4824891788575418]
In this paper, we develop a system that supports humans to analyze conversations. We demonstrate the ability of our system to take in a wide range of multimodal information and automatically generated a prediction score for the depression state of the individual.
arXiv Detail & Related papers (2020-01-13T10:47:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.