Fugu-MT 論文翻訳(概要): Explainable Parkinsons Disease Gait Recognition Using Multimodal RGB-D Fusion and Large Language Models

論文の概要: Explainable Parkinsons Disease Gait Recognition Using Multimodal RGB-D Fusion and Large Language Models

arxiv url: http://arxiv.org/abs/2512.04425v1
Date: Thu, 04 Dec 2025 03:43:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-05 21:11:45.974277
Title: Explainable Parkinsons Disease Gait Recognition Using Multimodal RGB-D Fusion and Large Language Models
Title（参考訳）: マルチモーダルRGB-Dフュージョンと大規模言語モデルを用いた説明可能なパーキンソン病歩行認識
Authors: Manar Alnaasan, Md Selim Sarowar, Sungho Kim,
Abstract要約: 本稿では、Parkinsonian Gaitパターンを認識するために、RGBとDepth(RGB-D)データを統合した説明可能なマルチモーダルフレームワークを提案する。本研究は,多モーダル特徴学習と言語に基づく解釈可能性を組み合わせることで,視覚認識と臨床的理解のギャップを埋めるものである。
参考スコア（独自算出の注目度）: 6.2676602262188625
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate and interpretable gait analysis plays a crucial role in the early detection of Parkinsons disease (PD),yet most existing approaches remain limited by single-modality inputs, low robustness, and a lack of clinical transparency. This paper presents an explainable multimodal framework that integrates RGB and Depth (RGB-D) data to recognize Parkinsonian gait patterns under realistic conditions. The proposed system employs dual YOLOv11-based encoders for modality-specific feature extraction, followed by a Multi-Scale Local-Global Extraction (MLGE) module and a Cross-Spatial Neck Fusion mechanism to enhance spatial-temporal representation. This design captures both fine-grained limb motion (e.g., reduced arm swing) and overall gait dynamics (e.g., short stride or turning difficulty), even in challenging scenarios such as low lighting or occlusion caused by clothing. To ensure interpretability, a frozen Large Language Model (LLM) is incorporated to translate fused visual embeddings and structured metadata into clinically meaningful textual explanations. Experimental evaluations on multimodal gait datasets demonstrate that the proposed RGB-D fusion framework achieves higher recognition accuracy, improved robustness to environmental variations, and clear visual-linguistic reasoning compared with single-input baselines. By combining multimodal feature learning with language-based interpretability, this study bridges the gap between visual recognition and clinical understanding, offering a novel vision-language paradigm for reliable and explainable Parkinsons disease gait analysis. Code:https://github.com/manaralnaasan/RGB-D_parkinson-LLM
Abstract（参考訳）: 正確な歩行分析は、パーキンソン病(PD)の早期発見において重要な役割を担っているが、既存のほとんどのアプローチは、単一モダリティ入力、低ロバスト性、臨床透明性の欠如によって制限されている。本稿では,実環境下でのParkinsonianの歩行パターンを認識するために,RGBとDepth(RGB-D)データを統合したマルチモーダルフレームワークを提案する。提案システムでは,2つのYOLOv11ベースのエンコーダをモダリティ固有の特徴抽出に使用し,次いでマルチスケール局所グラフ抽出 (MLGE) モジュールと空間空間的表現を高めるクロス空間ネック融合機構を用いる。このデザインは、低照度や衣服による閉塞といった困難なシナリオであっても、細粒な手足の動き(例えば、腕のスイングの減少)と全体的な歩行動態(例えば、短いストライドや旋回困難)の両方を捉えている。解釈可能性を確保するために,凍結型大規模言語モデル(LLM)が組み込まれ,融合した視覚的埋め込みと構造化メタデータを臨床的に意味のあるテキスト記述に変換する。マルチモーダル歩行データセットの実験的評価により,提案したRGB-D融合フレームワークは,認識精度の向上,環境変動に対する堅牢性の向上,単一入力ベースラインと比較して視覚言語的推論の明確化を実現している。本研究は,多モーダル特徴学習と言語に基づく解釈可能性を組み合わせることで,視覚認識と臨床理解のギャップを埋め,信頼性と説明可能なパーキンソン病歩行分析のための新しい視覚言語パラダイムを提供する。コード:https://github.com/manaralnaasan/RGB-D_parkinson-LLM

論文の概要: Explainable Parkinsons Disease Gait Recognition Using Multimodal RGB-D Fusion and Large Language Models

関連論文リスト