Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition
- URL: http://arxiv.org/abs/2406.02554v1
- Date: Fri, 22 Mar 2024 22:52:35 GMT
- Title: Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition
- Authors: Shijian Deng, Erin E. Kosloski, Siddhi Patel, Zeke A. Barnett, Yiyang Nan, Alexander Kaplan, Sisira Aarukapalli, William T. Doan, Matthew Wang, Harsh Singh, Pamela R. Rollins, Yapeng Tian,
- Abstract summary: We introduce a novel problem of audio-visual autism behavior recognition.
Social behavior recognition is an essential aspect previously omitted in AI-assisted autism screening research.
We will release our dataset, code, and pre-trained models.
- Score: 47.550391816383794
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this article, we introduce a novel problem of audio-visual autism behavior recognition, which includes social behavior recognition, an essential aspect previously omitted in AI-assisted autism screening research. We define the task at hand as one that is audio-visual autism behavior recognition, which uses audio and visual cues, including any speech present in the audio, to recognize autism-related behaviors. To facilitate this new research direction, we collected an audio-visual autism spectrum dataset (AV-ASD), currently the largest video dataset for autism screening using a behavioral approach. It covers an extensive range of autism-associated behaviors, including those related to social communication and interaction. To pave the way for further research on this new problem, we intensively explored leveraging foundation models and multimodal large language models across different modalities. Our experiments on the AV-ASD dataset demonstrate that integrating audio, visual, and speech modalities significantly enhances the performance in autism behavior recognition. Additionally, we explored the use of a post-hoc to ad-hoc pipeline in a multimodal large language model to investigate its potential to augment the model's explanatory capability during autism behavior recognition. We will release our dataset, code, and pre-trained models.
Related papers
- Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder [27.788204861041553]
We investigate the use of foundation models across three modalities: speech, video, and text, to analyse child-focused interaction sessions.
We evaluate their performance on two tasks with different information granularity: activity recognition and abnormal behavior detection.
arXiv Detail & Related papers (2024-09-20T16:06:46Z) - Human Gesture and Gait Analysis for Autism Detection [23.77172199742202]
Atypical gait and gesture patterns are dominant behavioral characteristics of autism.
We present an analysis of gesture and gait activity in videos to identify children with autism.
arXiv Detail & Related papers (2023-04-17T15:31:22Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Language-Assisted Deep Learning for Autistic Behaviors Recognition [13.200025637384897]
We show that a vision-based problem behaviors recognition system can achieve high accuracy and outperform the previous methods by a large margin.
We propose a two-branch multimodal deep learning framework by incorporating the "freely available" language description for each type of problem behavior.
Experimental results demonstrate that incorporating additional language supervision can bring an obvious performance boost for the autism problem behaviors recognition task.
arXiv Detail & Related papers (2022-11-17T02:58:55Z) - Vision-Based Activity Recognition in Children with Autism-Related
Behaviors [15.915410623440874]
We demonstrate the effect of a region-based computer vision system to help clinicians and parents analyze a child's behavior.
The data is pre-processed by detecting the target child in the video to reduce the impact of background noise.
Motivated by the effectiveness of temporal convolutional models, we propose both light-weight and conventional models capable of extracting action features from video frames.
arXiv Detail & Related papers (2022-08-08T15:12:27Z) - Classifying Autism from Crowdsourced Semi-Structured Speech Recordings:
A Machine Learning Approach [0.9945783208680666]
We present a suite of machine learning approaches to detect autism in self-recorded speech audio captured from autistic and neurotypical (NT) children in home environments.
We consider three methods to detect autism in child speech: first, Random Forests trained on extracted audio features; second, convolutional neural networks (CNNs) trained on spectrograms; and third, fine-tuned wav2vec 2.0--a state-of-the-art Transformer-based ASR model.
arXiv Detail & Related papers (2022-01-04T01:31:02Z) - Audio-visual Representation Learning for Anomaly Events Detection in
Crowds [119.72951028190586]
This paper attempts to exploit multi-modal learning for modeling the audio and visual signals simultaneously.
We conduct the experiments on SHADE dataset, a synthetic audio-visual dataset in surveillance scenes.
We find introducing audio signals effectively improves the performance of anomaly events detection and outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-10-28T02:42:48Z) - CogAlign: Learning to Align Textual Neural Representations to Cognitive
Language Processing Signals [60.921888445317705]
We propose a CogAlign approach to integrate cognitive language processing signals into natural language processing models.
We show that CogAlign achieves significant improvements with multiple cognitive features over state-of-the-art models on public datasets.
arXiv Detail & Related papers (2021-06-10T07:10:25Z) - Muti-view Mouse Social Behaviour Recognition with Deep Graphical Model [124.26611454540813]
Social behaviour analysis of mice is an invaluable tool to assess therapeutic efficacy of neurodegenerative diseases.
Because of the potential to create rich descriptions of mouse social behaviors, the use of multi-view video recordings for rodent observations is increasingly receiving much attention.
We propose a novel multiview latent-attention and dynamic discriminative model that jointly learns view-specific and view-shared sub-structures.
arXiv Detail & Related papers (2020-11-04T18:09:58Z) - 4D Spatio-Temporal Deep Learning with 4D fMRI Data for Autism Spectrum
Disorder Classification [69.62333053044712]
We propose a 4D convolutional deep learning approach for ASD classification where we jointly learn from spatial and temporal data.
We employ 4D neural networks and convolutional-recurrent models which outperform a previous approach with an F1-score of 0.71 compared to an F1-score of 0.65.
arXiv Detail & Related papers (2020-04-21T17:19:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.