Detecting In-Person Conversations in Noisy Real-World Environments with Smartwatch Audio and Motion Sensing
- URL: http://arxiv.org/abs/2507.12002v1
- Date: Wed, 16 Jul 2025 07:57:15 GMT
- Title: Detecting In-Person Conversations in Noisy Real-World Environments with Smartwatch Audio and Motion Sensing
- Authors: Alice Zhang, Callihan Bertley, Dawei Liang, Edison Thomaz,
- Abstract summary: Social interactions play a crucial role in shaping human behavior, relationships, and societies.<n>We develop a novel computational approach to detect a foundational aspect of human social interactions, in-person verbal conversations.
- Score: 1.5999407512883512
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Social interactions play a crucial role in shaping human behavior, relationships, and societies. It encompasses various forms of communication, such as verbal conversation, non-verbal gestures, facial expressions, and body language. In this work, we develop a novel computational approach to detect a foundational aspect of human social interactions, in-person verbal conversations, by leveraging audio and inertial data captured with a commodity smartwatch in acoustically-challenging scenarios. To evaluate our approach, we conducted a lab study with 11 participants and a semi-naturalistic study with 24 participants. We analyzed machine learning and deep learning models with 3 different fusion methods, showing the advantages of fusing audio and inertial data to consider not only verbal cues but also non-verbal gestures in conversations. Furthermore, we perform a comprehensive set of evaluations across activities and sampling rates to demonstrate the benefits of multimodal sensing in specific contexts. Overall, our framework achieved 82.0$\pm$3.0% macro F1-score when detecting conversations in the lab and 77.2$\pm$1.8% in the semi-naturalistic setting.
Related papers
- Aligning Spoken Dialogue Models from User Interactions [55.192134724622235]
We propose a novel preference alignment framework to improve spoken dialogue models on realtime conversations from user interactions.<n>We create a dataset of more than 150,000 preference pairs from raw multi-turn speech conversations annotated with AI feedback.<n>Our findings shed light on the importance of a well-calibrated balance among various dynamics, crucial for natural real-time speech dialogue systems.
arXiv Detail & Related papers (2025-06-26T16:45:20Z) - Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions [13.341099059080936]
This study aims to equip chatbots with "eyes and ears" capable of more immersive interactions with humans.<n>We introduce a new multimodal conversation dataset, Multimodal Multi-Session Multi-Party Conversation.<n>Our model, trained on the $M3C$, demonstrates the ability to seamlessly engage in long-term conversations with multiple speakers.
arXiv Detail & Related papers (2025-05-31T06:50:51Z) - Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation [70.52558242336988]
We focus on predicting engagement in dyadic interactions by scrutinizing verbal and non-verbal cues, aiming to detect signs of disinterest or confusion.
In this work, we collect a dataset featuring 34 participants engaged in casual dyadic conversations, each providing self-reported engagement ratings at the end of each conversation.
We introduce a novel fusion strategy using Large Language Models (LLMs) to integrate multiple behavior modalities into a multimodal transcript''
arXiv Detail & Related papers (2024-09-13T18:28:12Z) - Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People [20.95122915164433]
We propose an iterative method for simultaneously eliciting conversational tones and sentences.
We show how our approach can be used to create an interpretable representation of relations between conversational tones in humans and GPT-4.
arXiv Detail & Related papers (2024-06-06T17:26:00Z) - Interactive Natural Language Processing [67.87925315773924]
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP.
This paper offers a comprehensive survey of iNLP, starting by proposing a unified definition and framework of the concept.
arXiv Detail & Related papers (2023-05-22T17:18:29Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion [89.01668641930206]
We present a framework for modeling interactional communication in dyadic conversations.
We autoregressively output multiple possibilities of corresponding listener motion.
Our method organically captures the multimodal and non-deterministic nature of nonverbal dyadic interactions.
arXiv Detail & Related papers (2022-04-18T17:58:04Z) - Advancing an Interdisciplinary Science of Conversation: Insights from a
Large Multimodal Corpus of Human Speech [0.12038936091716987]
In this report we advance an interdisciplinary science of conversation, with findings from a large, multimodal corpus of 1,656 recorded conversations in spoken English.
This 7+ million word, 850 hour corpus totals over 1TB of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression.
We report (5) a comprehensive mixed-method report, based on quantitative analysis and qualitative review of each recording, that showcases how individuals from diverse backgrounds alter their communication patterns and find ways to connect.
arXiv Detail & Related papers (2022-03-01T18:50:33Z) - You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation.
Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z) - Detecting depression in dyadic conversations with multimodal narratives
and visualizations [1.4824891788575418]
In this paper, we develop a system that supports humans to analyze conversations.
We demonstrate the ability of our system to take in a wide range of multimodal information and automatically generated a prediction score for the depression state of the individual.
arXiv Detail & Related papers (2020-01-13T10:47:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.