Cross-Enhanced Multimodal Fusion of Eye-Tracking and Facial Features for Alzheimer's Disease Diagnosis
- URL: http://arxiv.org/abs/2510.24777v1
- Date: Sat, 25 Oct 2025 13:30:24 GMT
- Title: Cross-Enhanced Multimodal Fusion of Eye-Tracking and Facial Features for Alzheimer's Disease Diagnosis
- Authors: Yujie Nie, Jianzhang Ni, Yonglong Ye, Yuan-Ting Zhang, Yun Kwok Wing, Xiangqing Xu, Xin Ma, Lizhou Fan,
- Abstract summary: Eye-tracking and facial features are important indicators of cognitive function, reflecting attentional distribution and neurocognitive state.<n>We propose a multimodal cross-enhanced fusion framework that leverages eye-tracking and facial features for Alzheimer's disease detection.<n>Our framework outperforms traditional late fusion and feature concatenation methods.
- Score: 9.111075363945892
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate diagnosis of Alzheimer's disease (AD) is essential for enabling timely intervention and slowing disease progression. Multimodal diagnostic approaches offer considerable promise by integrating complementary information across behavioral and perceptual domains. Eye-tracking and facial features, in particular, are important indicators of cognitive function, reflecting attentional distribution and neurocognitive state. However, few studies have explored their joint integration for auxiliary AD diagnosis. In this study, we propose a multimodal cross-enhanced fusion framework that synergistically leverages eye-tracking and facial features for AD detection. The framework incorporates two key modules: (a) a Cross-Enhanced Fusion Attention Module (CEFAM), which models inter-modal interactions through cross-attention and global enhancement, and (b) a Direction-Aware Convolution Module (DACM), which captures fine-grained directional facial features via horizontal-vertical receptive fields. Together, these modules enable adaptive and discriminative multimodal representation learning. To support this work, we constructed a synchronized multimodal dataset, including 25 patients with AD and 25 healthy controls (HC), by recording aligned facial video and eye-tracking sequences during a visual memory-search paradigm, providing an ecologically valid resource for evaluating integration strategies. Extensive experiments on this dataset demonstrate that our framework outperforms traditional late fusion and feature concatenation methods, achieving a classification accuracy of 95.11% in distinguishing AD from HC, highlighting superior robustness and diagnostic performance by explicitly modeling inter-modal dependencies and modality-specific contributions.
Related papers
- MRC-GAT: A Meta-Relational Copula-Based Graph Attention Network for Interpretable Multimodal Alzheimer's Disease Diagnosis [2.2399170518036913]
Alzheimer's disease (AD) is a progressive neurodegenerative condition necessitating early and precise diagnosis to provide prompt clinical management.<n>Recent studies have increasingly focused on computer-aided diagnostic models to enhance precision and reliability.<n>To overcome these limitations, the Meta-Relational Copula-Based Graph Attention Network (MRC-GAT) is proposed as an efficient multimodal model for AD classification tasks.
arXiv Detail & Related papers (2026-02-17T17:15:32Z) - R-GenIMA: Integrating Neuroimaging and Genetics with Interpretable Multimodal AI for Alzheimer's Disease Progression [63.97617759805451]
Early detection of Alzheimer's disease requires models capable of integrating macro-scale neuroanatomical alterations with micro-scale genetic susceptibility.<n>We introduce R-GenIMA, an interpretable multimodal large language model that couples a novel ROI-wise vision transformer with genetic prompting.<n>R-GenIMA achieves state-of-the-art performance in four-way classification across normal cognition, subjective memory concerns, mild cognitive impairment, and AD.
arXiv Detail & Related papers (2025-12-22T02:54:10Z) - Collaborative Attention and Consistent-Guided Fusion of MRI and PET for Alzheimer's Disease Diagnosis [12.33741976057116]
Alzheimer's disease (AD) is the most prevalent form of dementia, and its early diagnosis is essential for slowing disease progression.<n>Recent studies on multimodal neuroimaging fusion using MRI and PET have achieved promising results.<n>We propose a Collaborative Attention and Consistent-Guided Fusion framework for MRI and PET based AD diagnosis.
arXiv Detail & Related papers (2025-11-04T03:42:07Z) - RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis [56.373297358647655]
Retrieval-Augmented Diagnosis (RAD) is a novel framework that injects external knowledge into multimodal models directly on downstream tasks.<n>RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss transformer, and a dual decoder.
arXiv Detail & Related papers (2025-09-24T10:36:14Z) - Multimodal Medical Endoscopic Image Analysis via Progressive Disentangle-aware Contrastive Learning [11.158864816564538]
We present an innovative multi-modality representation learning framework based on the Align-Disentangle-Fusion' mechanism.<n>Our method consistently outperforms state-of-the-art approaches, achieving superior accuracy across diverse real clinical scenarios.
arXiv Detail & Related papers (2025-08-23T03:02:51Z) - NEARL-CLIP: Interacted Query Adaptation with Orthogonal Regularization for Medical Vision-Language Understanding [51.63264715941068]
textbfNEARL-CLIP (iunderlineNteracted quunderlineEry underlineAdaptation with ounderlineRthogonaunderlineL Regularization) is a novel cross-modality interaction VLM-based framework.
arXiv Detail & Related papers (2025-08-06T05:44:01Z) - MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention [57.044719143401664]
Histopathology and transcriptomics are fundamental modalities in oncology, encapsulating the morphological and molecular aspects of the disease.<n>We present MIRROR, a novel multi-modal representation learning method designed to foster both modality alignment and retention.<n>Extensive evaluations on TCGA cohorts for cancer subtyping and survival analysis highlight MIRROR's superior performance.
arXiv Detail & Related papers (2025-03-01T07:02:30Z) - Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.<n>Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.<n>Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.<n>Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z) - Multimodal Outer Arithmetic Block Dual Fusion of Whole Slide Images and Omics Data for Precision Oncology [6.418265127069878]
We propose the use of omic embeddings during early and late fusion to capture complementary information from local (patch-level) to global (slide-level) interactions.<n>This dual fusion strategy enhances interpretability and classification performance, highlighting its potential for clinical diagnostics.
arXiv Detail & Related papers (2024-11-26T13:25:53Z) - An interpretable generative multimodal neuroimaging-genomics framework for decoding Alzheimer's disease [13.213387075528017]
Alzheimer's disease (AD) is the most prevalent form of dementia worldwide, encompassing a prodromal stage known as Mild Cognitive Impairment (MCI)<n>The objective of the work was to capture structural and functional modulations of brain structure and function relying on multimodal MRI data and Single Nucleotide Polymorphisms.
arXiv Detail & Related papers (2024-06-19T07:31:47Z) - Multi-task Paired Masking with Alignment Modeling for Medical
Vision-Language Pre-training [55.56609500764344]
We propose a unified framework based on Multi-task Paired Masking with Alignment (MPMA) to integrate the cross-modal alignment task into the joint image-text reconstruction framework.
We also introduce a Memory-Augmented Cross-Modal Fusion (MA-CMF) module to fully integrate visual information to assist report reconstruction.
arXiv Detail & Related papers (2023-05-13T13:53:48Z) - Multimodal Attention-based Deep Learning for Alzheimer's Disease
Diagnosis [9.135911493822261]
Alzheimer's Disease (AD) is the most common neurodegenerative disorder with one of the most complex pathogeneses.
We present a Multimodal Alzheimer's Disease Diagnosis framework (MADDi) to accurately detect the presence of AD.
arXiv Detail & Related papers (2022-06-17T15:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.