Related papers: Hybrid Vision Transformer-Mamba Framework for Autism Diagnosis via Eye-Tracking Analysis

Hybrid Vision Transformer-Mamba Framework for Autism Diagnosis via Eye-Tracking Analysis

URL: http://arxiv.org/abs/2506.06886v1
Date: Sat, 07 Jun 2025 18:27:24 GMT
Title: Hybrid Vision Transformer-Mamba Framework for Autism Diagnosis via Eye-Tracking Analysis
Authors: Wafaa Kasri, Yassine Himeur, Abigail Copiaco, Wathiq Mansoor, Ammar Albanna, Valsamma Eapen,
Abstract summary: This study presents a hybrid deep learning framework combining Vision Transformers (ViT) and Vision Mamba to detect ASD.<n>The model uses attention-based fusion to integrate visual, speech, and facial cues, capturing both spatial and temporal dynamics.<n>Tested on the Saliency4ASD dataset, the proposed ViT-Mamba model outperformed existing methods, achieving 0.96 accuracy, 0.95 F1-score, 0.97 sensitivity, and 0.94 specificity.
Score: 2.481802259298367
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate Autism Spectrum Disorder (ASD) diagnosis is vital for early intervention. This study presents a hybrid deep learning framework combining Vision Transformers (ViT) and Vision Mamba to detect ASD using eye-tracking data. The model uses attention-based fusion to integrate visual, speech, and facial cues, capturing both spatial and temporal dynamics. Unlike traditional handcrafted methods, it applies state-of-the-art deep learning and explainable AI techniques to enhance diagnostic accuracy and transparency. Tested on the Saliency4ASD dataset, the proposed ViT-Mamba model outperformed existing methods, achieving 0.96 accuracy, 0.95 F1-score, 0.97 sensitivity, and 0.94 specificity. These findings show the model's promise for scalable, interpretable ASD screening, especially in resource-constrained or remote clinical settings where access to expert diagnosis is limited.

Related papers

Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging [41.446379453352534]
Latent Diffusion Autoencoder (LDAE) is a novel encoder-decoder diffusion-based framework for efficient and meaningful unsupervised learning in medical imaging.<n>This study focuses on Alzheimer disease (AD) using brain MR from the ADNI database as a case study.
arXiv Detail & Related papers (2025-04-11T15:37:46Z)
GS-TransUNet: Integrated 2D Gaussian Splatting and Transformer UNet for Accurate Skin Lesion Analysis [44.99833362998488]
We present a novel approach that combines 2D Gaussian splatting with the Transformer UNet architecture for automated skin cancer diagnosis.<n>Our findings illustrate significant advancements in the precision of segmentation and classification.<n>This integration sets new benchmarks in the field and highlights the potential for further research into multi-task medical image analysis methodologies.
arXiv Detail & Related papers (2025-02-23T23:28:47Z)
Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Pathology Analysis [37.11302829771659]
Large vision-language models (LVLMs) are limited by input resolution constraints, hindering their efficiency and accuracy in pathology image analysis.<n>We propose two innovative strategies: the mixed task-guided feature enhancement, and the prompt-guided detail feature completion.<n>We trained the pathology-specialized LVLM, OmniPath, which significantly outperforms existing methods in diagnostic accuracy and efficiency.
arXiv Detail & Related papers (2024-12-12T18:07:23Z)
Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z)
Analyzing the Effect of $k$-Space Features in MRI Classification Models [0.0]
We have developed an explainable AI methodology tailored for medical imaging. We employ a Convolutional Neural Network (CNN) that analyzes MRI scans across both image and frequency domains. This approach not only enhances early training efficiency but also deepens our understanding of how additional features impact the model predictions.
arXiv Detail & Related papers (2024-09-20T15:43:26Z)
Explainable AI for Autism Diagnosis: Identifying Critical Brain Regions Using fMRI Data [0.29687381456163997]
Early diagnosis and intervention for Autism Spectrum Disorder (ASD) has been shown to significantly improve the quality of life of autistic individuals.<n>There is a need for objective biomarkers of ASD which can help improve diagnostic accuracy.<n>Deep learning (DL) has achieved outstanding performance in diagnosing diseases and conditions from medical imaging data.<n>This research aims to improve the accuracy and interpretability of ASD diagnosis by creating a DL model that can not only accurately classify ASD but also provide explainable insights into its working.
arXiv Detail & Related papers (2024-09-19T23:08:09Z)
Ensemble Modeling of Multiple Physical Indicators to Dynamically Phenotype Autism Spectrum Disorder [3.6630139570443996]
We provide a dataset for training computer vision models to detect Autism Spectrum Disorder (ASD)-related phenotypic markers. We trained individual LSTM-based models using eye gaze, head positions, and facial landmarks as input features, achieving test AUCs of 86%, 67%, and 78%.
arXiv Detail & Related papers (2024-08-23T17:55:58Z)
Shifting Focus: From Global Semantics to Local Prominent Features in Swin-Transformer for Knee Osteoarthritis Severity Assessment [42.09313885494969]
We harness the Swin Transformer's capacity to discern extended spatial dependencies within images through the hierarchical framework. Our novel contribution lies in refining local feature representations, orienting them specifically toward the final distribution of the classifier. Our model demonstrates significant robustness and precision, as evidenced by extensive validation of two established benchmarks for Knee OsteoArthritis (KOA) grade classification.
arXiv Detail & Related papers (2024-03-15T01:09:58Z)
Involution Fused ConvNet for Classifying Eye-Tracking Patterns of Children with Autism Spectrum Disorder [1.225920962851304]
Autism Spectrum Disorder (ASD) is a complicated neurological condition which is challenging to diagnose. Numerous studies demonstrate that children diagnosed with ASD struggle with maintaining attention spans and have less focused vision. Eye-tracking technology has drawn special attention in the context of ASD since anomalies in gaze have long been acknowledged as a defining feature of autism in general.
arXiv Detail & Related papers (2024-01-07T20:08:17Z)
DDxT: Deep Generative Transformer Models for Differential Diagnosis [51.25660111437394]
We show that a generative approach trained with simpler supervised and self-supervised learning signals can achieve superior results on the current benchmark. The proposed Transformer-based generative network, named DDxT, autoregressively produces a set of possible pathologies, i.e., DDx, and predicts the actual pathology using a neural network.
arXiv Detail & Related papers (2023-12-02T22:57:25Z)
Self-supervised Feature Learning via Exploiting Multi-modal Data for Retinal Disease Diagnosis [28.428216831922228]
This paper presents a novel self-supervised feature learning method by effectively exploiting multi-modal data for retinal disease diagnosis. Our objective learns both modality-invariant features and patient-similarity features. We evaluate our method on two public benchmark datasets for retinal disease diagnosis.
arXiv Detail & Related papers (2020-07-21T19:49:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.