A Novel Dataset for Video-Based Neurodivergent Classification Leveraging Extra-Stimulatory Behavior
- URL: http://arxiv.org/abs/2409.04598v2
- Date: Fri, 22 Aug 2025 03:58:59 GMT
- Title: A Novel Dataset for Video-Based Neurodivergent Classification Leveraging Extra-Stimulatory Behavior
- Authors: Manuel Serna-Aguilera, Xuan Bac Nguyen, Han-Seok Seo, Khoa Luu,
- Abstract summary: We introduce the Video ASD dataset-a dataset that contains video frame convolutional and attention map feature data.<n>Results show that our model effectively generalizes and understands key differences in the distinct movements of the children.
- Score: 15.88235629194724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Facial expressions and actions differ among different individuals at varying degrees of intensity given responses to external stimuli, particularly among those that are neurodivergent. Such behaviors affect people in terms of overall health, communication, and sensory processing. Deep learning can be responsibly leveraged to improve productivity in addressing this task, and help medical professionals to accurately understand such behaviors. In this work, we introduce the Video ASD dataset-a dataset that contains video frame convolutional and attention map feature data-to foster further progress in the task of ASD classification. Unlike many recent studies in ASD classification with MRI data, which require expensive specialized equipment, our method utilizes a powerful but relatively affordable GPU, a standard computer setup, and a video camera for inference. Results show that our model effectively generalizes and understands key differences in the distinct movements of the children. Additionally, we test foundation models on this data to showcase how movement noise affects performance and the need for more data and more complex labels.
Related papers
- Adapting HFMCA to Graph Data: Self-Supervised Learning for Generalizable fMRI Representations [57.054499278843856]
Functional magnetic resonance imaging (fMRI) analysis faces significant challenges due to limited dataset sizes and domain variability between studies.<n>Traditional self-supervised learning methods inspired by computer vision often rely on positive and negative sample pairs.<n>We propose adapting a recently developed Hierarchical Functional Maximal Correlation Algorithm (HFMCA) to graph-structured fMRI data.
arXiv Detail & Related papers (2025-10-05T12:35:01Z) - MMeViT: Multi-Modal ensemble ViT for Post-Stroke Rehabilitation Action Recognition [1.0781866671930853]
Key component of remote monitoring systems is Human Action Recognition (HAR) technology, which classifies actions.<n>Har research for stroke has largely concentrated on classifying relatively simple actions using machine learning rather than deep learning.<n>In this study, we designed a system to monitor the actions of stroke patients, focusing on domiciliary upper limb Activities of Daily Living (ADL)<n>We analyzed the collected dataset and found that the action data of stroke patients is less clustering than that of non-disabled individuals.
arXiv Detail & Related papers (2025-09-27T01:46:26Z) - Whole-brain Transferable Representations from Large-Scale fMRI Data Improve Task-Evoked Brain Activity Decoding [3.416130444086009]
We propose STDA-SwiFT, a transformer-based model that learns transferable representations from large-scale fMRI datasets.<n>We show that our model substantially improves downstream decoding performance of task-evoked activity.<n>Our work showcases transfer learning as a viable approach to overcome challenges in decoding brain activity from fMRI data.
arXiv Detail & Related papers (2025-07-30T04:36:58Z) - DecoFuse: Decomposing and Fusing the "What", "Where", and "How" for Brain-Inspired fMRI-to-Video Decoding [82.91021399231184]
Existing fMRI-to-video methods often focus on semantic content while overlooking spatial and motion information.
We propose DecoFuse, a novel brain-inspired framework for decoding videos from fMRI signals.
It first decomposes the video into three components - semantic, spatial, and motion - then decodes each component separately before fusing them to reconstruct the video.
arXiv Detail & Related papers (2025-04-01T05:28:37Z) - Cracking the Code of Hallucination in LVLMs with Vision-aware Head Divergence [69.86946427928511]
We investigate the internal mechanisms driving hallucination in large vision-language models (LVLMs)
We introduce Vision-aware Head Divergence (VHD), a metric that quantifies the sensitivity of attention head outputs to visual context.
We propose Vision-aware Head Reinforcement (VHR), a training-free approach to mitigate hallucination by enhancing the role of vision-aware attention heads.
arXiv Detail & Related papers (2024-12-18T15:29:30Z) - Advanced Gesture Recognition in Autism: Integrating YOLOv7, Video Augmentation and VideoMAE for Video Analysis [9.162792034193373]
This research work aims to identify repetitive behaviors indicative of autism by analyzing videos captured in natural settings as children engage in daily activities.
The focus is on accurately categorizing real-time repetitive gestures such as spinning, head banging, and arm flapping.
A key component of the proposed methodology is the use of textbfVideoMAE, a model designed to improve both spatial and temporal analysis of video data.
arXiv Detail & Related papers (2024-10-12T02:55:37Z) - ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z) - MMASD+: A Novel Dataset for Privacy-Preserving Behavior Analysis of Children with Autism Spectrum Disorder [1.6210252731619712]
This work introduces MMASD+, an enhanced version of the novel open-source dataset called Multimodal ASD (MMASD)
MMASD+ consists of diverse data modalities, including 3D-Skeleton, 3D Body Mesh, and Optical Flow data.
A Multimodal Transformer framework is proposed to predict 11 action types and the presence of ASD.
arXiv Detail & Related papers (2024-08-27T14:05:48Z) - Knowledge-Guided Prompt Learning for Lifespan Brain MR Image Segmentation [53.70131202548981]
We present a two-step segmentation framework employing Knowledge-Guided Prompt Learning (KGPL) for brain MRI.
Specifically, we first pre-train segmentation models on large-scale datasets with sub-optimal labels.
The introduction of knowledge-wise prompts captures semantic relationships between anatomical variability and biological processes.
arXiv Detail & Related papers (2024-07-31T04:32:43Z) - Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition [47.550391816383794]
We introduce a novel problem of audio-visual autism behavior recognition.
Social behavior recognition is an essential aspect previously omitted in AI-assisted autism screening research.
We will release our dataset, code, and pre-trained models.
arXiv Detail & Related papers (2024-03-22T22:52:35Z) - Video-Based Autism Detection with Deep Learning [0.0]
We develop a deep learning model that analyzes video clips of children reacting to sensory stimuli.
Results show that our model effectively generalizes and understands key differences in the distinct movements of the children.
arXiv Detail & Related papers (2024-02-26T17:45:00Z) - Exploiting the Brain's Network Structure for Automatic Identification of
ADHD Subjects [70.37277191524755]
We show that the brain can be modeled as a functional network, and certain properties of the networks differ in ADHD subjects from control subjects.
We train our classifier with 776 subjects and test on 171 subjects provided by The Neuro Bureau for the ADHD-200 challenge.
arXiv Detail & Related papers (2023-06-15T16:22:57Z) - Language-Assisted Deep Learning for Autistic Behaviors Recognition [13.200025637384897]
We show that a vision-based problem behaviors recognition system can achieve high accuracy and outperform the previous methods by a large margin.
We propose a two-branch multimodal deep learning framework by incorporating the "freely available" language description for each type of problem behavior.
Experimental results demonstrate that incorporating additional language supervision can bring an obvious performance boost for the autism problem behaviors recognition task.
arXiv Detail & Related papers (2022-11-17T02:58:55Z) - How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios [73.24092762346095]
We introduce two large-scale datasets with over 60,000 videos annotated for emotional response and subjective wellbeing.
The Video Cognitive Empathy dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states.
The Video to Valence dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing.
arXiv Detail & Related papers (2022-10-18T17:58:25Z) - Vision-Based Activity Recognition in Children with Autism-Related
Behaviors [15.915410623440874]
We demonstrate the effect of a region-based computer vision system to help clinicians and parents analyze a child's behavior.
The data is pre-processed by detecting the target child in the video to reduce the impact of background noise.
Motivated by the effectiveness of temporal convolutional models, we propose both light-weight and conventional models capable of extracting action features from video frames.
arXiv Detail & Related papers (2022-08-08T15:12:27Z) - Detection of ADHD based on Eye Movements during Natural Viewing [3.1890959219836574]
ADHD is a neurodevelopmental disorder that is highly prevalent and requires clinical specialists to diagnose.
We develop an end-to-end deep learning-based sequence model which we pre-train on a related task.
We find that the method is in fact able to detect ADHD and outperforms relevant baselines.
arXiv Detail & Related papers (2022-07-04T12:56:04Z) - Overcoming the Domain Gap in Neural Action Representations [60.47807856873544]
3D pose data can now be reliably extracted from multi-view video sequences without manual intervention.
We propose to use it to guide the encoding of neural action representations together with a set of neural and behavioral augmentations.
To reduce the domain gap, during training, we swap neural and behavioral data across animals that seem to be performing similar actions.
arXiv Detail & Related papers (2021-12-02T12:45:46Z) - Muti-view Mouse Social Behaviour Recognition with Deep Graphical Model [124.26611454540813]
Social behaviour analysis of mice is an invaluable tool to assess therapeutic efficacy of neurodegenerative diseases.
Because of the potential to create rich descriptions of mouse social behaviors, the use of multi-view video recordings for rodent observations is increasingly receiving much attention.
We propose a novel multiview latent-attention and dynamic discriminative model that jointly learns view-specific and view-shared sub-structures.
arXiv Detail & Related papers (2020-11-04T18:09:58Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Modeling Shared Responses in Neuroimaging Studies through MultiView ICA [94.31804763196116]
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
We propose a novel MultiView Independent Component Analysis model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise.
We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects.
arXiv Detail & Related papers (2020-06-11T17:29:53Z) - Detecting Parkinsonian Tremor from IMU Data Collected In-The-Wild using
Deep Multiple-Instance Learning [59.74684475991192]
Parkinson's Disease (PD) is a slowly evolving neuro-logical disease that affects about 1% of the population above 60 years old.
PD symptoms include tremor, rigidity and braykinesia.
We present a method for automatically identifying tremorous episodes related to PD, based on IMU signals captured via a smartphone device.
arXiv Detail & Related papers (2020-05-06T09:02:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.