Fugu-MT 論文翻訳(概要): Cross-Enhanced Multimodal Fusion of Eye-Tracking and Facial Features for Alzheimer's Disease Diagnosis

論文の概要: Cross-Enhanced Multimodal Fusion of Eye-Tracking and Facial Features for Alzheimer's Disease Diagnosis

arxiv url: http://arxiv.org/abs/2510.24777v1
Date: Sat, 25 Oct 2025 13:30:24 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-30 15:50:44.485575
Title: Cross-Enhanced Multimodal Fusion of Eye-Tracking and Facial Features for Alzheimer's Disease Diagnosis
Title（参考訳）: アルツハイマー病診断における視線追跡と顔面像の相互融合
Authors: Yujie Nie, Jianzhang Ni, Yonglong Ye, Yuan-Ting Zhang, Yun Kwok Wing, Xiangqing Xu, Xin Ma, Lizhou Fan,
Abstract要約: 視線追跡と顔の特徴は、注意分布と神経認知状態を反映する認知機能の重要な指標である。アルツハイマー病の診断に視線追跡と顔の特徴を活用する多モーダルクロスエンハンス融合フレームワークを提案する。我々のフレームワークは、従来のレイトフュージョンや特徴連結法よりも優れています。
参考スコア（独自算出の注目度）: 9.111075363945892
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate diagnosis of Alzheimer's disease (AD) is essential for enabling timely intervention and slowing disease progression. Multimodal diagnostic approaches offer considerable promise by integrating complementary information across behavioral and perceptual domains. Eye-tracking and facial features, in particular, are important indicators of cognitive function, reflecting attentional distribution and neurocognitive state. However, few studies have explored their joint integration for auxiliary AD diagnosis. In this study, we propose a multimodal cross-enhanced fusion framework that synergistically leverages eye-tracking and facial features for AD detection. The framework incorporates two key modules: (a) a Cross-Enhanced Fusion Attention Module (CEFAM), which models inter-modal interactions through cross-attention and global enhancement, and (b) a Direction-Aware Convolution Module (DACM), which captures fine-grained directional facial features via horizontal-vertical receptive fields. Together, these modules enable adaptive and discriminative multimodal representation learning. To support this work, we constructed a synchronized multimodal dataset, including 25 patients with AD and 25 healthy controls (HC), by recording aligned facial video and eye-tracking sequences during a visual memory-search paradigm, providing an ecologically valid resource for evaluating integration strategies. Extensive experiments on this dataset demonstrate that our framework outperforms traditional late fusion and feature concatenation methods, achieving a classification accuracy of 95.11% in distinguishing AD from HC, highlighting superior robustness and diagnostic performance by explicitly modeling inter-modal dependencies and modality-specific contributions.
Abstract（参考訳）: アルツハイマー病(AD)の正確な診断は、タイムリーな介入と疾患進行の鈍化に不可欠である。マルチモーダル診断アプローチは、行動ドメインと知覚ドメインをまたいだ相補的な情報を統合することで、かなり有望である。視線追跡と顔の特徴は、特に認知機能の重要な指標であり、注意分布と神経認知状態を反映している。しかし, 補助的AD診断のためのジョイントインテリジェンスについて検討する研究はほとんどない。本研究では、視線追跡と顔の特徴を相乗的に活用し、AD検出を行うマルチモーダルクロスエンハンスドフュージョンフレームワークを提案する。このフレームワークには2つの重要なモジュールが含まれている。 (a)クロスアテンションとグローバルエンハンスメントを通してモーダル間相互作用をモデル化するクロスエンハンスメント・フュージョン・アテンション・モジュール(CEFAM) b) 指向性認識変換モジュール (DACM) は, 水平垂直受容場を介して, きめ細かな顔の特徴を捉える。これらのモジュールは、適応的で差別的なマルチモーダル表現学習を可能にする。この作業を支援するために,視覚記憶探索パラダイムにおいて顔映像と視線追跡シーケンスを記録し,統合戦略を評価するための生態学的に有効な資源を提供することにより,AD患者25名と健康管理患者25名を含む同期マルチモーダルデータセットを構築した。このデータセットの大規模な実験により、我々のフレームワークは従来のレイトフュージョン法や特徴結合法よりも優れており、ADとHCを区別して95.11%の分類精度を実現し、モーダル間の依存関係とモダリティ固有の貢献を明示的にモデル化することで、優れた堅牢性と診断性能を強調している。

論文の概要: Cross-Enhanced Multimodal Fusion of Eye-Tracking and Facial Features for Alzheimer's Disease Diagnosis

関連論文リスト