Detecting Neurocognitive Disorders through Analyses of Topic Evolution and Cross-modal Consistency in Visual-Stimulated Narratives
- URL: http://arxiv.org/abs/2501.03727v2
- Date: Wed, 18 Jun 2025 14:38:47 GMT
- Title: Detecting Neurocognitive Disorders through Analyses of Topic Evolution and Cross-modal Consistency in Visual-Stimulated Narratives
- Authors: Jinchao Li, Yuejiao Wang, Junan Li, Jiawen Kang, Bo Zheng, Simon Wong, Brian Mak, Helene Fung, Jean Woo, Man-Wai Mak, Timothy Kwok, Vincent Mok, Xianmin Gong, Xixin Wu, Xunying Liu, Patrick Wong, Helen Meng,
- Abstract summary: Early detection of neurocognitive disorders (NCDs) is crucial for timely intervention and disease management.<n>We propose two novel dynamic macrostructural approaches to measure cross-modal consistency between speech and visual stimuli.<n> Experimental results validated the efficiency of proposed approaches in NCD detection, with TITAN achieving superior performance both on the CU-MARVEL-RABBIT corpus and the ADReSS corpus.
- Score: 84.03001845263
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Early detection of neurocognitive disorders (NCDs) is crucial for timely intervention and disease management. Given that language impairments manifest early in NCD progression, visual-stimulated narrative (VSN)-based analysis offers a promising avenue for NCD detection. Current VSN-based NCD detection methods primarily focus on linguistic microstructures (e.g., pauses, lexical diversity), which are potentially linked to bottom-up (stimulus-driven) cognitive processing. While these features illuminate basic language abilities, the higher-order linguistic macrostructures (e.g., thematic or logical development), which may reflect top-down (concept-driven) cognitive abilities, remain underexplored. These patterns are crucial for NCD detection yet challenging to quantify due to their abstract and complex nature. To bridge this gap, we propose two novel dynamic macrostructural approaches: (1) Dynamic Topic Model (DTM) to track topic evolution over time, and (2) Text-Image Temporal Alignment Network (TITAN) to measure cross-modal consistency between speech and visual stimuli. Experimental results validated the efficiency of proposed approaches in NCD detection, with TITAN achieving superior performance both on the CU-MARVEL-RABBIT corpus (F1 = 0.7238) and the ADReSS corpus (F1 = 0.8889). The feature contribution analysis revealed that macrostructural features (e.g., topic variability, topic change rate, and topic consistency) constituted the most significant contributors in the model's decision pathways, outperforming investigated microstructural features. These findings underscore the critical role of macrostructural patterns in understanding cognitive impairment mechanisms in NCDs.
Related papers
- Naturalistic Language-related Movie-Watching fMRI Task for Detecting Neurocognitive Decline and Disorder [60.84344168388442]
Language-related functional magnetic resonance imaging (fMRI) may be a promising approach for detecting cognitive decline and early NCD.<n>We examined the effectiveness of this task among 97 non-demented Chinese older adults from Hong Kong.<n>The study demonstrated the potential of the naturalistic language-related fMRI task for early detection of aging-related cognitive decline and NCD.
arXiv Detail & Related papers (2025-06-10T16:58:47Z) - Unlocking Multimodal Integration in EHRs: A Prompt Learning Framework for Language and Time Series Fusion [27.70300880284899]
Large language models (LLMs) have shown remarkable performance in vision-language tasks, but their application in the medical field remains underexplored.
We introduce ProMedTS, a novel self-supervised multimodal framework that employs prompt-guided learning to unify data types.
We evaluate ProMedTS on disease diagnosis tasks using real-world datasets, and the results demonstrate that our method consistently outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2025-02-19T07:56:48Z) - NeuroXVocal: Detection and Explanation of Alzheimer's Disease through Non-invasive Analysis of Picture-prompted Speech [4.815952991777717]
NeuroXVocal is a novel dual-component system that classifies and explains potential Alzheimer's Disease (AD) cases through speech analysis.<n>The classification component (Neuro) processes three distinct data streams: acoustic features capturing speech patterns and voice characteristics, textual features extracted from speech transcriptions, and precomputed embeddings representing linguistic patterns.<n>The explainability component (XVocal) implements a Retrieval-Augmented Generation (RAG) approach, leveraging Large Language Models combined with a domain-specific knowledge base of AD research literature.
arXiv Detail & Related papers (2025-02-14T12:09:49Z) - Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition [64.56321246196859]
We propose a novel dyNamically Evolving dUal skeleton-semantic syneRgistic framework.<n>We first construct the spatial-temporal evolving micro-prototypes and integrate dynamic context-aware side information.<n>We introduce the spatial compression and temporal memory mechanisms to guide the growth of spatial-temporal micro-prototypes.
arXiv Detail & Related papers (2024-11-18T05:16:11Z) - Cognitive Networks and Performance Drive fMRI-Based State Classification Using DNN Models [0.0]
We employ two structurally different and complementary DNN-based models to classify individual cognitive states.
We show that despite the architectural differences, both models consistently produce a robust relationship between prediction accuracy and individual cognitive performance.
arXiv Detail & Related papers (2024-08-14T15:25:51Z) - An interpretable generative multimodal neuroimaging-genomics framework for decoding Alzheimer's disease [13.213387075528017]
Alzheimer's disease (AD) is the most prevalent form of dementia worldwide, encompassing a prodromal stage known as Mild Cognitive Impairment (MCI)<n>The objective of the work was to capture structural and functional modulations of brain structure and function relying on multimodal MRI data and Single Nucleotide Polymorphisms.
arXiv Detail & Related papers (2024-06-19T07:31:47Z) - Interpretable Spatio-Temporal Embedding for Brain Structural-Effective Network with Ordinary Differential Equation [56.34634121544929]
In this study, we first construct the brain-effective network via the dynamic causal model.
We then introduce an interpretable graph learning framework termed Spatio-Temporal Embedding ODE (STE-ODE)
This framework incorporates specifically designed directed node embedding layers, aiming at capturing the dynamic interplay between structural and effective networks.
arXiv Detail & Related papers (2024-05-21T20:37:07Z) - DSAM: A Deep Learning Framework for Analyzing Temporal and Spatial Dynamics in Brain Networks [4.041732967881764]
Most rs-fMRI studies compute a single static functional connectivity matrix across brain regions of interest.
These approaches are at risk of oversimplifying brain dynamics and lack proper consideration of the goal at hand.
We propose a novel interpretable deep learning framework that learns goal-specific functional connectivity matrix directly from time series.
arXiv Detail & Related papers (2024-05-19T23:35:06Z) - Multi-task Collaborative Pre-training and Individual-adaptive-tokens
Fine-tuning: A Unified Framework for Brain Representation Learning [3.1453938549636185]
We propose a unified framework that combines Collaborative pre-training and Individual--Tokens fine-tuning.
The proposed MCIAT achieves state-of-the-art diagnosis performance on the ADHD-200 dataset.
arXiv Detail & Related papers (2023-06-20T08:38:17Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - M-SENSE: Modeling Narrative Structure in Short Personal Narratives Using
Protagonist's Mental Representations [14.64546899992196]
We propose the task of automatically detecting prominent elements of the narrative structure by analyzing the role of characters' inferred mental state.
We introduce a STORIES dataset of short personal narratives containing manual annotations of key elements of narrative structure, specifically climax and resolution.
Our model is able to achieve significant improvements in the task of identifying climax and resolution.
arXiv Detail & Related papers (2023-02-18T20:48:02Z) - Cross-Modal Causal Relational Reasoning for Event-Level Visual Question
Answering [134.91774666260338]
Existing visual question answering methods often suffer from cross-modal spurious correlations and oversimplified event-level reasoning processes.
We propose a framework for cross-modal causal relational reasoning to address the task of event-level visual question answering.
arXiv Detail & Related papers (2022-07-26T04:25:54Z) - An Empirical Study: Extensive Deep Temporal Point Process [61.14164208094238]
We first review recent research emphasis and difficulties in modeling asynchronous event sequences with deep temporal point process.
We propose a Granger causality discovery framework for exploiting the relations among multi-types of events.
arXiv Detail & Related papers (2021-10-19T10:15:00Z) - Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process.
Our method significantly reduces the required number of interactions compared with random intervention targeting.
We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z) - On-the-Fly Attention Modularization for Neural Generation [54.912042110885366]
We show that generated text is repetitive, generic, self-inconsistent, and lacking commonsense.
Our findings motivate on-the-fly attention modularization, a simple but effective method for injecting inductive biases into attention during inference.
arXiv Detail & Related papers (2021-01-02T05:16:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.