Related papers: Advancing Cache-Based Few-Shot Classification via Patch-Driven Relational Gated Graph Attention

Advancing Cache-Based Few-Shot Classification via Patch-Driven Relational Gated Graph Attention

URL: http://arxiv.org/abs/2512.12498v1
Date: Sat, 13 Dec 2025 23:53:00 GMT
Title: Advancing Cache-Based Few-Shot Classification via Patch-Driven Relational Gated Graph Attention
Authors: Tasweer Ahmad, Arindam Sikdar, Sandip Pradhan, Ardhendu Behera,
Abstract summary: Few-shot image classification remains difficult under limited supervision.<n>Recent cache-based adaptation approaches (e.g., Tip-Adapter) address this challenge to some extent.<n>We introduce a novel patch-driven relational refinement that learns cache adapter weights from intra-image patch dependencies.
Score: 3.4693817403659515
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Few-shot image classification remains difficult under limited supervision and visual domain shift. Recent cache-based adaptation approaches (e.g., Tip-Adapter) address this challenge to some extent by learning lightweight residual adapters over frozen features, yet they still inherit CLIP's tendency to encode global, general-purpose representations that are not optimally discriminative to adapt the generalist to the specialist's domain in low-data regimes. We address this limitation with a novel patch-driven relational refinement that learns cache adapter weights from intra-image patch dependencies rather than treating an image embedding as a monolithic vector. Specifically, we introduce a relational gated graph attention network that constructs a patch graph and performs edge-aware attention to emphasize informative inter-patch interactions, producing context-enriched patch embeddings. A learnable multi-aggregation pooling then composes these into compact, task-discriminative representations that better align cache keys with the target few-shot classes. Crucially, the proposed graph refinement is used only during training to distil relational structure into the cache, incurring no additional inference cost beyond standard cache lookup. Final predictions are obtained by a residual fusion of cache similarity scores with CLIP zero-shot logits. Extensive evaluations on 11 benchmarks show consistent gains over state-of-the-art CLIP adapter and cache-based baselines while preserving zero-shot efficiency. We further validate battlefield relevance by introducing an Injured vs. Uninjured Soldier dataset for casualty recognition. It is motivated by the operational need to support triage decisions within the "platinum minutes" and the broader "golden hour" window in time-critical UAV-driven search-and-rescue and combat casualty care.

Related papers

GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness [75.00019285120878]
Key-value (KV) caching can mitigate this, but storing the full cache is prohibitive for image-heavy contexts.<n>Existing cache-compression methods are sub-optimal as they do not account for the spatial and temporal redundancy of GUIs.<n>We introduce GUI-KV, a plug-and-play KV cache compression method for GUI agents that requires no retraining.
arXiv Detail & Related papers (2025-10-01T05:37:54Z)
KVCompose: Efficient Structured KV Cache Compression with Composite Tokens [7.922206020386125]
Large language models (LLMs) rely on key-value (KV) caches for efficient autoregressive decoding.<n>We propose a simple, yet effective, KV cache compression framework based on attention-guided, layer-adaptive composite tokens.<n>Our method achieves significant memory reduction while preserving accuracy, consistently outperforming prior structured and semi-structured methods.
arXiv Detail & Related papers (2025-09-05T14:58:24Z)
SubGCache: Accelerating Graph-based RAG with Subgraph-level KV Cache [20.26177496265456]
SubGCache aims to reduce inference latency by reusing computation across queries with similar structural prompts.<n>Experiments on two new datasets demonstrate that SubGCache consistently reduces inference latency with comparable and even improved generation quality.
arXiv Detail & Related papers (2025-05-16T07:39:41Z)
AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference [11.73134417321505]
We propose AirCache, a novel KV cache compression method aimed at accelerating LVLMs inference.<n>We show that our method achieves comparable performance to the full cache while retaining only 10% of visual KV cache.
arXiv Detail & Related papers (2025-03-31T11:13:18Z)
Compositional Caching for Training-free Open-vocabulary Attribute Detection [65.46250297408974]
We present Compositional Caching (ComCa), a training-free method for open-vocabulary attribute detection.<n>ComCa requires only the list of target attributes and objects as input, using them to populate an auxiliary cache of images.<n>Experiments on public datasets demonstrate that ComCa significantly outperforms zero-shot and cache-based baselines.
arXiv Detail & Related papers (2025-03-24T21:00:37Z)
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection [54.21851618853518]
We present a concise yet effective approach called Patch Generation-to-Selection to enhance CLIP's training efficiency.<n>Our approach, CLIP-PGS, sets new state-of-the-art results in zero-shot classification and retrieval tasks.
arXiv Detail & Related papers (2025-03-21T12:10:38Z)
AttentionPredictor: Temporal Patterns Matter for KV Cache Compression [64.75459635661562]
We propose AttentionPredictor, which is the first learning-based method to directly predict attention patterns for KV cache compression and critical token identification.<n> AttentionPredictor accurately predicts the attention score and shares the unified prediction model, which consumes negligible memory.<n>By retaining most of the attention information, AttentionPredictor achieves 13$times$ KV cache compression and 5.6$times$ speedup in a cache offloading scenario.
arXiv Detail & Related papers (2025-02-06T13:41:46Z)
Cross-Self KV Cache Pruning for Efficient Vision-Language Inference [19.062950348441426]
KV cache pruning has emerged as a promising technique for reducing memory and computation costs in long-context auto-regressive generation.<n>We propose decomposing attention scores into intra-modality attention (within the same modality) and inter-modality attention (across modalities)<n>Our final training-free method, textbfCross-textbfSelf textbfPruning (CSP), achieves competitive performance compared to models with full KV caches.
arXiv Detail & Related papers (2024-12-05T22:47:17Z)
Efficient Inference of Vision Instruction-Following Models with Elastic Cache [76.44955111634545]
We introduce Elastic Cache, a novel strategy for efficient deployment of instruction-following large vision-language models. We propose an importance-driven cache merging strategy to prune redundancy caches. For instruction encoding, we utilize the frequency to evaluate the importance of caches. Results on a range of LVLMs demonstrate that Elastic Cache not only boosts efficiency but also notably outperforms existing pruning methods in language generation.
arXiv Detail & Related papers (2024-07-25T15:29:05Z)
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training [55.12082817901671]
We propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT)<n>MaPeT employs autoregressive and permuted predictions to capture intra-patch dependencies.<n>Our results demonstrate that MaPeT achieves competitive performance on ImageNet, compared to baselines and competitors under the same model setting.
arXiv Detail & Related papers (2023-06-12T18:12:19Z)
Accelerating Deep Learning Classification with Error-controlled Approximate-key Caching [72.50506500576746]
We propose a novel caching paradigm, that we named approximate-key caching. While approximate cache hits alleviate DL inference workload and increase the system throughput, they however introduce an approximation error. We analytically model our caching system performance for classic LRU and ideal caches, we perform a trace-driven evaluation of the expected performance, and we compare the benefits of our proposed approach with the state-of-the-art similarity caching.
arXiv Detail & Related papers (2021-12-13T13:49:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.