Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling
- URL: http://arxiv.org/abs/2409.09340v1
- Date: Sat, 14 Sep 2024 07:03:08 GMT
- Title: Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling
- Authors: Tiantian Feng, Anfeng Xu, Xuan Shi, Somer Bishop, Shrikanth Narayanan,
- Abstract summary: Autism spectrum disorder (ASD) is a neurodevelopmental condition characterized by challenges in social communication, repetitive behavior, and sensory processing.
One important research area in ASD is evaluating children's behavioral changes over time during treatment.
A fundamental aspect of understanding children's behavior in these interactions is automatic speech understanding.
- Score: 30.099739460287566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autism spectrum disorder (ASD) is a neurodevelopmental condition characterized by challenges in social communication, repetitive behavior, and sensory processing. One important research area in ASD is evaluating children's behavioral changes over time during treatment. The standard protocol with this objective is BOSCC, which involves dyadic interactions between a child and clinicians performing a pre-defined set of activities. A fundamental aspect of understanding children's behavior in these interactions is automatic speech understanding, particularly identifying who speaks and when. Conventional approaches in this area heavily rely on speech samples recorded from a spectator perspective, and there is limited research on egocentric speech modeling. In this study, we design an experiment to perform speech sampling in BOSCC interviews from an egocentric perspective using wearable sensors and explore pre-training Ego4D speech samples to enhance child-adult speaker classification in dyadic interactions. Our findings highlight the potential of egocentric speech collection and pre-training to improve speaker classification accuracy.
Related papers
- Exploring Speech Pattern Disorders in Autism using Machine Learning [12.469348589699766]
This study presents a comprehensive approach to identify distinctive speech patterns through the analysis of examiner-patient dialogues.
We extracted 40 speech-related features, categorized into frequency, zero-crossing rate, energy, spectral characteristics, Mel Frequency Cepstral Coefficients (MFCCs) and balance.
The classification model aimed to differentiate between ASD and non-ASD cases, achieving an accuracy of 87.75%.
arXiv Detail & Related papers (2024-05-03T02:59:15Z) - Parameter Selection for Analyzing Conversations with Autism Spectrum
Disorder [1.11612113079373]
We present a modeling approach to autism spectrum disorder (ASD) diagnosis by analyzing acoustic/prosodic and linguistic features extracted from diagnostic conversations.
Our results can facilitate fine-grained analysis of conversation data for children with ASD to support diagnosis and intervention.
arXiv Detail & Related papers (2024-01-18T04:28:56Z) - Path Signature Representation of Patient-Clinician Interactions as a
Predictor for Neuropsychological Tests Outcomes in Children: A Proof of
Concept [40.737684553736166]
The study utilised a dataset of 39 video recordings, capturing extensive sessions where clinicians administered cognitive assessment tests.
Despite the limited sample size and heterogeneous recording styles, the analysis successfully extracted path signatures as features from the recorded data.
Results suggest that these features exhibit promising potential for predicting all cognitive tests scores of the entire session length.
arXiv Detail & Related papers (2023-12-12T12:14:08Z) - A Hierarchical Regression Chain Framework for Affective Vocal Burst
Recognition [72.36055502078193]
We propose a hierarchical framework, based on chain regression models, for affective recognition from vocal bursts.
To address the challenge of data sparsity, we also use self-supervised learning (SSL) representations with layer-wise and temporal aggregation modules.
The proposed systems participated in the ACII Affective Vocal Burst (A-VB) Challenge 2022 and ranked first in the "TWO'' and "CULTURE" tasks.
arXiv Detail & Related papers (2023-03-14T16:08:45Z) - Psychophysiological Arousal in Young Children Who Stutter: An
Interpretable AI Approach [6.507353572917133]
The presented study effectively identifies and visualizes the second-by-second pattern differences in the physiological arousal of preschool-age children who do stutter (CWS) and who do not stutter (CWNS)
The first condition may affect children's speech due to high arousal; the latter introduces linguistic, cognitive, and communicative demands on speakers.
arXiv Detail & Related papers (2022-08-03T13:28:15Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - TalkTive: A Conversational Agent Using Backchannels to Engage Older
Adults in Neurocognitive Disorders Screening [51.97352212369947]
We analyzed 246 conversations of cognitive assessments between older adults and human assessors.
We derived the categories of reactive backchannels and proactive backchannels.
This is used in the development of TalkTive, a CA which can predict both timing and form of backchanneling.
arXiv Detail & Related papers (2022-02-16T17:55:34Z) - Perception Point: Identifying Critical Learning Periods in Speech for
Bilingual Networks [58.24134321728942]
We compare and identify cognitive aspects on deep neural-based visual lip-reading models.
We observe a strong correlation between these theories in cognitive psychology and our unique modeling.
arXiv Detail & Related papers (2021-10-13T05:30:50Z) - Disambiguating Affective Stimulus Associations for Robot Perception and
Dialogue [67.89143112645556]
We provide a NICO robot with the ability to learn the associations between a perceived auditory stimulus and an emotional expression.
NICO is able to do this for both individual subjects and specific stimuli, with the aid of an emotion-driven dialogue system.
The robot is then able to use this information to determine a subject's enjoyment of perceived auditory stimuli in a real HRI scenario.
arXiv Detail & Related papers (2021-03-05T20:55:48Z) - Comparison of Speaker Role Recognition and Speaker Enrollment Protocol
for conversational Clinical Interviews [9.728371067160941]
We train end-to-end neural network architectures to adapt to each task and evaluate each approach under the same metric.
Results do not depend on the demographics of the Interviewee, highlighting the clinical relevance of our methods.
arXiv Detail & Related papers (2020-10-30T09:07:37Z) - You Impress Me: Dialogue Generation via Mutual Persona Perception [62.89449096369027]
The research in cognitive science suggests that understanding is an essential signal for a high-quality chit-chat conversation.
Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding.
arXiv Detail & Related papers (2020-04-11T12:51:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.