Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder
- URL: http://arxiv.org/abs/2409.13606v1
- Date: Fri, 20 Sep 2024 16:06:46 GMT
- Title: Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder
- Authors: Aditya Kommineni, Digbalay Bose, Tiantian Feng, So Hyun Kim, Helen Tager-Flusberg, Somer Bishop, Catherine Lord, Sudarsana Kadiri, Shrikanth Narayanan,
- Abstract summary: We investigate the use of foundation models across three modalities: speech, video, and text, to analyse child-focused interaction sessions.
We evaluate their performance on two tasks with different information granularity: activity recognition and abnormal behavior detection.
- Score: 27.788204861041553
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Clinical videos in the context of Autism Spectrum Disorder are often long-form interactions between children and caregivers/clinical professionals, encompassing complex verbal and non-verbal behaviors. Objective analyses of these videos could provide clinicians and researchers with nuanced insights into the behavior of children with Autism Spectrum Disorder. Manually coding these videos is a time-consuming task and requires a high level of domain expertise. Hence, the ability to capture these interactions computationally can augment the manual effort and enable supporting the diagnostic procedure. In this work, we investigate the use of foundation models across three modalities: speech, video, and text, to analyse child-focused interaction sessions. We propose a unified methodology to combine multiple modalities by using large language models as reasoning agents. We evaluate their performance on two tasks with different information granularity: activity recognition and abnormal behavior detection. We find that the proposed multimodal pipeline provides robustness to modality-specific limitations and improves performance on the clinical video analysis compared to unimodal settings.
Related papers
- Weakly-supervised Autism Severity Assessment in Long Videos [11.976885834298566]
Autism Spectrum Disorder (ASD) is a diverse collection of neurobiological conditions marked by challenges in social communication and interactions.
Atypical behavior patterns in a long, untrimmed video can serve as biomarkers for children with ASD.
arXiv Detail & Related papers (2024-07-12T10:45:25Z) - Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [53.629132242389716]
Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions.
VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information.
We propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge.
arXiv Detail & Related papers (2024-05-29T23:19:28Z) - Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition [47.550391816383794]
We introduce a novel problem of audio-visual autism behavior recognition.
Social behavior recognition is an essential aspect previously omitted in AI-assisted autism screening research.
We will release our dataset, code, and pre-trained models.
arXiv Detail & Related papers (2024-03-22T22:52:35Z) - Video-Based Autism Detection with Deep Learning [0.0]
We develop a deep learning model that analyzes video clips of children reacting to sensory stimuli.
Results show that our model effectively generalizes and understands key differences in the distinct movements of the children.
arXiv Detail & Related papers (2024-02-26T17:45:00Z) - Show from Tell: Audio-Visual Modelling in Clinical Settings [58.88175583465277]
We consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations without human expert annotation.
A simple yet effective multi-modal self-supervised learning framework is proposed for this purpose.
The proposed approach is able to localise anatomical regions of interest during ultrasound imaging, with only speech audio as a reference.
arXiv Detail & Related papers (2023-10-25T08:55:48Z) - Language-Assisted Deep Learning for Autistic Behaviors Recognition [13.200025637384897]
We show that a vision-based problem behaviors recognition system can achieve high accuracy and outperform the previous methods by a large margin.
We propose a two-branch multimodal deep learning framework by incorporating the "freely available" language description for each type of problem behavior.
Experimental results demonstrate that incorporating additional language supervision can bring an obvious performance boost for the autism problem behaviors recognition task.
arXiv Detail & Related papers (2022-11-17T02:58:55Z) - Vision-Based Activity Recognition in Children with Autism-Related
Behaviors [15.915410623440874]
We demonstrate the effect of a region-based computer vision system to help clinicians and parents analyze a child's behavior.
The data is pre-processed by detecting the target child in the video to reduce the impact of background noise.
Motivated by the effectiveness of temporal convolutional models, we propose both light-weight and conventional models capable of extracting action features from video frames.
arXiv Detail & Related papers (2022-08-08T15:12:27Z) - Co-Located Human-Human Interaction Analysis using Nonverbal Cues: A
Survey [71.43956423427397]
We aim to identify the nonverbal cues and computational methodologies resulting in effective performance.
This survey differs from its counterparts by involving the widest spectrum of social phenomena and interaction settings.
Some major observations are: the most often used nonverbal cue, computational method, interaction environment, and sensing approach are speaking activity, support vector machines, and meetings composed of 3-4 persons equipped with microphones and cameras, respectively.
arXiv Detail & Related papers (2022-07-20T13:37:57Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Early Autism Spectrum Disorders Diagnosis Using Eye-Tracking Technology [62.997667081978825]
Lack of money, absence of qualified specialists, and low level of trust to the correction methods are the main issues that affect the in-time diagnoses of ASD.
Our team developed the algorithm that will be able to predict the chances of ASD according to the information from the gaze activity of the child.
arXiv Detail & Related papers (2020-08-21T20:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.