Beyond Plain Demos: A Demo-centric Anchoring Paradigm for In-Context Learning in Alzheimer's Disease Detection
- URL: http://arxiv.org/abs/2511.06826v1
- Date: Mon, 10 Nov 2025 08:13:42 GMT
- Title: Beyond Plain Demos: A Demo-centric Anchoring Paradigm for In-Context Learning in Alzheimer's Disease Detection
- Authors: Puzhen Su, Haoran Yin, Yongzhu Miao, Jintao Tang, Shasha Li, Ting Wang,
- Abstract summary: Alzheimer's disease (AD) from narrative transcripts challenges large language models (LLMs)<n>Improving in-context learning for AD detection hinges on enriching perception through better demonstration (demo) sets.<n>We introduce textbfDA4ICL, a demo-centric anchoring framework that jointly expands context width via emphtextbfDiverse and Contrastive Retrieval (DCR) and deepens each demo's signal via emphtextbfProjected Vector Anchoring (PVA) at every Transformer layer.
- Score: 12.125931365583218
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detecting Alzheimer's disease (AD) from narrative transcripts challenges large language models (LLMs): pre-training rarely covers this out-of-distribution task, and all transcript demos describe the same scene, producing highly homogeneous contexts. These factors cripple both the model's built-in task knowledge (\textbf{task cognition}) and its ability to surface subtle, class-discriminative cues (\textbf{contextual perception}). Because cognition is fixed after pre-training, improving in-context learning (ICL) for AD detection hinges on enriching perception through better demonstration (demo) sets. We demonstrate that standard ICL quickly saturates, its demos lack diversity (context width) and fail to convey fine-grained signals (context depth), and that recent task vector (TV) approaches improve broad task adaptation by injecting TV into the LLMs' hidden states (HSs), they are ill-suited for AD detection due to the mismatch of injection granularity, strength and position. To address these bottlenecks, we introduce \textbf{DA4ICL}, a demo-centric anchoring framework that jointly expands context width via \emph{\textbf{Diverse and Contrastive Retrieval}} (DCR) and deepens each demo's signal via \emph{\textbf{Projected Vector Anchoring}} (PVA) at every Transformer layer. Across three AD benchmarks, DA4ICL achieves large, stable gains over both ICL and TV baselines, charting a new paradigm for fine-grained, OOD and low-resource LLM adaptation.
Related papers
- Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs [100.02824137397464]
We investigate how Large Language Models adapt their internal representations when encountering inputs of increasing difficulty.<n>We reveal a consistent and quantifiable phenomenon: as task difficulty increases, the last hidden states of LLMs become substantially sparser.<n>This sparsity--difficulty relation is observable across diverse models and domains.
arXiv Detail & Related papers (2026-03-03T18:48:15Z) - Look Carefully: Adaptive Visual Reinforcements in Multimodal Large Language Models for Hallucination Mitigation [51.743225614196774]
Multimodal large language models (MLLMs) have achieved remarkable progress in vision-language reasoning.<n>They remain vulnerable to hallucination, where generated content deviates from visual evidence.<n>Recent vision enhancement methods attempt to address this issue by reinforcing visual tokens during decoding.<n>We propose Adaptive Visual Reinforcement (AIR), a training-free framework for MLLMs.
arXiv Detail & Related papers (2026-02-27T14:18:51Z) - Steering and Rectifying Latent Representation Manifolds in Frozen Multi-modal LLMs for Video Anomaly Detection [52.5174167737992]
Video anomaly detection (VAD) aims to identify abnormal events in videos.<n>We propose SteerVAD, which advances MLLM-based VAD by shifting from passively reading to actively steering and rectifying internal representations.<n>Our method achieves state-of-the-art performance among tuning-free approaches requiring only 1% of training data.
arXiv Detail & Related papers (2026-02-27T13:48:50Z) - Context-Aware Decoding for Faithful Vision-Language Generation [5.258492912374723]
Hallucinations, generating responses inconsistent with the visual input, remain a critical limitation of large vision-language models (LVLMs)<n>We probe the layer-wise generation dynamics that drive hallucinations and propose a training-free mitigation strategy.
arXiv Detail & Related papers (2026-01-09T16:50:57Z) - WeMMU: Enhanced Bridging of Vision-Language Models and Diffusion Models via Noisy Query Tokens [69.97021957331326]
We propose Noisy Query Tokens, which learn a distributed representation space between the VLM and Diffusion Model via end-to-end optimization.<n>We also introduce a VAE branch with linear projection to recover fine-grained image details.
arXiv Detail & Related papers (2025-12-02T09:02:20Z) - Beyond Single Models: Mitigating Multimodal Hallucinations via Adaptive Token Ensemble Decoding [41.828387997311474]
Large Vision-Language Models (LVLMs) have recently achieved impressive results in multimodal tasks such as image captioning and visual question answering.<n>They remain prone to object hallucination -- generating descriptions of nonexistent or misidentified objects.<n>We propose Adaptive Token Ensemble Decoding (ATED), a training-free, token-level ensemble framework that mitigates hallucination by aggregating predictions from multiple LVLMs during inference.
arXiv Detail & Related papers (2025-10-21T06:11:24Z) - Unlocking Vision-Language Models for Video Anomaly Detection via Fine-Grained Prompting [17.850029260662648]
We propose ASK-Hint, a structured prompting framework for video anomaly detection.<n>Our approach organizes prompts into semantically coherent groups and formulates fine-grained guiding questions.<n>Experiments on UCF-Crime and XD-Violence show that ASK-Hint consistently improves AUC over prior baselines.
arXiv Detail & Related papers (2025-10-02T16:06:31Z) - Anchored Diffusion Language Model [39.17770765212062]
We introduce the Anchored Diffusion Language Model (ADLM), a novel framework that predicts distributions over important tokens via an anchor network.<n>ADLM significantly improves test perplexity on LM1B and OpenWebText, achieving up to 25.4% gains over prior DLMs.<n>It also surpasses AR models in MAUVE score, which marks the first time a DLM generates better human-like text than an AR model.
arXiv Detail & Related papers (2025-05-24T01:34:14Z) - Making Acoustic Side-Channel Attacks on Noisy Keyboards Viable with LLM-Assisted Spectrograms' "Typo" Correction [5.0998111447316194]
Large integration of microphones into devices increases the opportunities for Acoustic Side-Channel Attacks (ASCAs)<n>Current State-Of-The-Art (SOTA) models for ASCAs exhibit limited robustness under realistic noisy conditions.<n>We present the first-of-its-kind approach that integrates Visual Transformers (VTs) and Large Language Models (LLMs) for ASCAs.
arXiv Detail & Related papers (2025-04-15T21:23:25Z) - Enhancing Input-Label Mapping in In-Context Learning with Contrastive Decoding [71.01099784480597]
Large language models (LLMs) excel at a range of tasks through in-context learning (ICL)<n>We introduce In-Context Contrastive Decoding (ICCD), a novel method that emphasizes input-label mapping.
arXiv Detail & Related papers (2025-02-19T14:04:46Z) - Looking Beyond Text: Reducing Language bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance [67.26434607115392]
Large vision-language models (LVLMs) have achieved impressive results in various vision-language tasks.
LVLMs suffer from hallucinations caused by language bias, leading to diminished focus on images and ineffective visual comprehension.
We propose LACING to address the language bias of LVLMs with muLtimodal duAl-attention meChanIsm (MDA) aNd soft-image Guidance (IFG)
arXiv Detail & Related papers (2024-11-21T16:33:30Z) - MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection [107.15164718585666]
We investigate the root cause of VLMs' biased prediction under the open vocabulary detection context.
Our observations lead to a simple yet effective paradigm, coded MarvelOVD, that generates significantly better training targets.
Our method outperforms the other state-of-the-arts by significant margins.
arXiv Detail & Related papers (2024-07-31T09:23:57Z) - BECLR: Batch Enhanced Contrastive Few-Shot Learning [1.450405446885067]
Unsupervised few-shot learning aspires to bridge this gap by discarding the reliance on annotations at training time.
We propose a novel Dynamic Clustered mEmory (DyCE) module to promote a highly separable latent representation space.
We then tackle the, somehow overlooked yet critical, issue of sample bias at the few-shot inference stage.
arXiv Detail & Related papers (2024-02-04T10:52:43Z) - Unsupervised Continual Anomaly Detection with Contrastively-learned
Prompt [80.43623986759691]
We introduce a novel Unsupervised Continual Anomaly Detection framework called UCAD.
The framework equips the UAD with continual learning capability through contrastively-learned prompts.
We conduct comprehensive experiments and set the benchmark on unsupervised continual anomaly detection and segmentation.
arXiv Detail & Related papers (2024-01-02T03:37:11Z) - ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for
Improving ASR Robustness in Spoken Language Understanding [55.39105863825107]
We propose Mutual Learning and Large-Margin Contrastive Learning (ML-LMCL) to improve automatic speech recognition (ASR) robustness.
In fine-tuning, we apply mutual learning and train two SLU models on the manual transcripts and the ASR transcripts, respectively.
Experiments on three datasets show that ML-LMCL outperforms existing models and achieves new state-of-the-art performance.
arXiv Detail & Related papers (2023-11-19T16:53:35Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.