Detecting Neurocognitive Disorders through Analyses of Topic Evolution and Cross-modal Consistency in Visual-Stimulated Narratives
- URL: http://arxiv.org/abs/2501.03727v3
- Date: Mon, 27 Oct 2025 19:54:43 GMT
- Title: Detecting Neurocognitive Disorders through Analyses of Topic Evolution and Cross-modal Consistency in Visual-Stimulated Narratives
- Authors: Jinchao Li, Yuejiao Wang, Junan Li, Jiawen Kang, Bo Zheng, Ka Ho Wong, Brian Mak, Helene H. Fung, Jean Woo, Man-Wai Mak, Timothy Kwok, Vincent Mok, Xianmin Gong, Xixin Wu, Xunying Liu, Patrick C. M. Wong, Helen Meng,
- Abstract summary: Early detection of neurocognitive disorders (NCDs) is crucial for timely intervention and disease management.<n>Current VSN-based NCD detection methods primarily focus on linguistic microstructures closely tied to bottom-up, stimulus-driven cognitive processes.<n>We propose two novel macrostructural approaches: a Dynamic Topic Model (DTM) to track topic evolution over time, and a Text-Image Temporal Alignment Network (TITAN) to measure cross-modal consistency between narrative and visual stimuli.
- Score: 83.15653194899126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Early detection of neurocognitive disorders (NCDs) is crucial for timely intervention and disease management. Given that language impairments manifest early in NCD progression, visual-stimulated narrative (VSN)-based analysis offers a promising avenue for NCD detection. Current VSN-based NCD detection methods primarily focus on linguistic microstructures (e.g., lexical diversity) that are closely tied to bottom-up, stimulus-driven cognitive processes. While these features illuminate basic language abilities, the higher-order linguistic macrostructures (e.g., topic development) that may reflect top-down, concept-driven cognitive abilities remain underexplored. These macrostructural patterns are crucial for NCD detection, yet challenging to quantify due to their abstract and complex nature. To bridge this gap, we propose two novel macrostructural approaches: (1) a Dynamic Topic Model (DTM) to track topic evolution over time, and (2) a Text-Image Temporal Alignment Network (TITAN) to measure cross-modal consistency between narrative and visual stimuli. Experimental results show the effectiveness of the proposed approaches in NCD detection, with TITAN achieving superior performance across three corpora: ADReSS (F1=0.8889), ADReSSo (F1=0.8504), and CU-MARVEL-RABBIT (F1=0.7238). Feature contribution analysis reveals that macrostructural features (e.g., topic variability, topic change rate, and topic consistency) constitute the most significant contributors to the model's decision pathways, outperforming the investigated microstructural features. These findings underscore the value of macrostructural analysis for understanding linguistic-cognitive interactions associated with NCDs.
Related papers
- Uncovering Latent Communication Patterns in Brain Networks via Adaptive Flow Routing [6.266036335881278]
We formulate multi-modal fusion through the lens of neural communication dynamics.<n>AFR-Net is a physics-informed framework that models how structural constraints (SC) give rise to functional communication patterns (FC)<n>Experiments demonstrate that AFR-Net significantly outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2026-01-31T06:56:50Z) - Cognitive Foundations for Reasoning and Their Manifestation in LLMs [63.12951576410617]
Large language models (LLMs) solve complex problems yet fail on simpler variants, suggesting they achieve correct outputs through mechanisms fundamentally different from human reasoning.<n>We synthesize cognitive science research into a taxonomy of 28 cognitive elements spanning reasoning invariants, meta-cognitive controls, representations for organizing reasoning & knowledge, and transformation operations.<n>We develop test-time reasoning guidance that automatically scaffold successful structures, improving performance by up to 66.7% on complex problems.
arXiv Detail & Related papers (2025-11-20T18:59:00Z) - Interpretable Neuropsychiatric Diagnosis via Concept-Guided Graph Neural Networks [56.75602443936853]
One in five adolescents currently live with a diagnosed mental or behavioral health condition, such as anxiety, depression, or conduct disorder.<n>While prior works use graph neural network (GNN) approaches for disorder prediction, they remain black-boxes, limiting their reliability and clinical translation.<n>In this work, we propose a concept-based diagnosis framework that that encodes interpretable functional connectivity concepts.<n>Our design ensures predictions through clinically meaningful connectivity patterns, enabling both interpretability and strong predictive performance.
arXiv Detail & Related papers (2025-10-02T19:38:46Z) - From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models [66.36007274540113]
Multimodal Large Language Models (MLLMs) strive to achieve a profound, human-like understanding of and interaction with the physical world.<n>They often exhibit a shallow and incoherent integration when acquiring information (Perception) and conducting reasoning (Cognition)<n>This survey introduces a novel and unified analytical framework: From Perception to Cognition"
arXiv Detail & Related papers (2025-09-29T18:25:40Z) - Estimating the strength and timing of syntactic structure building in naturalistic reading [4.261343728593896]
We show that phrase structure can precede category detection and dominate lexical influences.<n>These findings support a predictive "tree-scaffolding" account of comprehension.
arXiv Detail & Related papers (2025-09-27T08:56:12Z) - Naturalistic Language-related Movie-Watching fMRI Task for Detecting Neurocognitive Decline and Disorder [60.84344168388442]
Language-related functional magnetic resonance imaging (fMRI) may be a promising approach for detecting cognitive decline and early NCD.<n>We examined the effectiveness of this task among 97 non-demented Chinese older adults from Hong Kong.<n>The study demonstrated the potential of the naturalistic language-related fMRI task for early detection of aging-related cognitive decline and NCD.
arXiv Detail & Related papers (2025-06-10T16:58:47Z) - Unlocking Multimodal Integration in EHRs: A Prompt Learning Framework for Language and Time Series Fusion [27.70300880284899]
Large language models (LLMs) have shown remarkable performance in vision-language tasks, but their application in the medical field remains underexplored.
We introduce ProMedTS, a novel self-supervised multimodal framework that employs prompt-guided learning to unify data types.
We evaluate ProMedTS on disease diagnosis tasks using real-world datasets, and the results demonstrate that our method consistently outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2025-02-19T07:56:48Z) - NeuroXVocal: Detection and Explanation of Alzheimer's Disease through Non-invasive Analysis of Picture-prompted Speech [4.815952991777717]
NeuroXVocal is a novel dual-component system that classifies and explains potential Alzheimer's Disease (AD) cases through speech analysis.<n>The classification component (Neuro) processes three distinct data streams: acoustic features capturing speech patterns and voice characteristics, textual features extracted from speech transcriptions, and precomputed embeddings representing linguistic patterns.<n>The explainability component (XVocal) implements a Retrieval-Augmented Generation (RAG) approach, leveraging Large Language Models combined with a domain-specific knowledge base of AD research literature.
arXiv Detail & Related papers (2025-02-14T12:09:49Z) - Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition [64.56321246196859]
We propose a novel dyNamically Evolving dUal skeleton-semantic syneRgistic framework.<n>We first construct the spatial-temporal evolving micro-prototypes and integrate dynamic context-aware side information.<n>We introduce the spatial compression and temporal memory mechanisms to guide the growth of spatial-temporal micro-prototypes.
arXiv Detail & Related papers (2024-11-18T05:16:11Z) - Cognitive Networks and Performance Drive fMRI-Based State Classification Using DNN Models [0.0]
We employ two structurally different and complementary DNN-based models to classify individual cognitive states.
We show that despite the architectural differences, both models consistently produce a robust relationship between prediction accuracy and individual cognitive performance.
arXiv Detail & Related papers (2024-08-14T15:25:51Z) - An interpretable generative multimodal neuroimaging-genomics framework for decoding Alzheimer's disease [13.213387075528017]
Alzheimer's disease (AD) is the most prevalent form of dementia worldwide, encompassing a prodromal stage known as Mild Cognitive Impairment (MCI)<n>The objective of the work was to capture structural and functional modulations of brain structure and function relying on multimodal MRI data and Single Nucleotide Polymorphisms.
arXiv Detail & Related papers (2024-06-19T07:31:47Z) - Interpretable Spatio-Temporal Embedding for Brain Structural-Effective Network with Ordinary Differential Equation [56.34634121544929]
In this study, we first construct the brain-effective network via the dynamic causal model.
We then introduce an interpretable graph learning framework termed Spatio-Temporal Embedding ODE (STE-ODE)
This framework incorporates specifically designed directed node embedding layers, aiming at capturing the dynamic interplay between structural and effective networks.
arXiv Detail & Related papers (2024-05-21T20:37:07Z) - DSAM: A Deep Learning Framework for Analyzing Temporal and Spatial Dynamics in Brain Networks [4.041732967881764]
Most rs-fMRI studies compute a single static functional connectivity matrix across brain regions of interest.
These approaches are at risk of oversimplifying brain dynamics and lack proper consideration of the goal at hand.
We propose a novel interpretable deep learning framework that learns goal-specific functional connectivity matrix directly from time series.
arXiv Detail & Related papers (2024-05-19T23:35:06Z) - SEINE: Structure Encoding and Interaction Network for Nuclei Instance
Segmentation [15.769396833096149]
Similar visual presentation of intranuclear and extranuclear regions of chromophobe nuclei often causes under-segmentation.
Current methods lack the exploration of nuclei structure, resulting in fragmented instance predictions.
This paper proposes a structure encoding and interaction network, SEINE, which develops the structure modeling scheme of nuclei.
arXiv Detail & Related papers (2024-01-18T07:44:04Z) - Multi-task Collaborative Pre-training and Individual-adaptive-tokens
Fine-tuning: A Unified Framework for Brain Representation Learning [3.1453938549636185]
We propose a unified framework that combines Collaborative pre-training and Individual--Tokens fine-tuning.
The proposed MCIAT achieves state-of-the-art diagnosis performance on the ADHD-200 dataset.
arXiv Detail & Related papers (2023-06-20T08:38:17Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - M-SENSE: Modeling Narrative Structure in Short Personal Narratives Using
Protagonist's Mental Representations [14.64546899992196]
We propose the task of automatically detecting prominent elements of the narrative structure by analyzing the role of characters' inferred mental state.
We introduce a STORIES dataset of short personal narratives containing manual annotations of key elements of narrative structure, specifically climax and resolution.
Our model is able to achieve significant improvements in the task of identifying climax and resolution.
arXiv Detail & Related papers (2023-02-18T20:48:02Z) - Cross-Modal Causal Relational Reasoning for Event-Level Visual Question
Answering [134.91774666260338]
Existing visual question answering methods often suffer from cross-modal spurious correlations and oversimplified event-level reasoning processes.
We propose a framework for cross-modal causal relational reasoning to address the task of event-level visual question answering.
arXiv Detail & Related papers (2022-07-26T04:25:54Z) - An Empirical Study: Extensive Deep Temporal Point Process [61.14164208094238]
We first review recent research emphasis and difficulties in modeling asynchronous event sequences with deep temporal point process.
We propose a Granger causality discovery framework for exploiting the relations among multi-types of events.
arXiv Detail & Related papers (2021-10-19T10:15:00Z) - Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process.
Our method significantly reduces the required number of interactions compared with random intervention targeting.
We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z) - On-the-Fly Attention Modularization for Neural Generation [54.912042110885366]
We show that generated text is repetitive, generic, self-inconsistent, and lacking commonsense.
Our findings motivate on-the-fly attention modularization, a simple but effective method for injecting inductive biases into attention during inference.
arXiv Detail & Related papers (2021-01-02T05:16:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.