Related papers: Batch-CAM: Introduction to better reasoning in convolutional deep learning models

Batch-CAM: Introduction to better reasoning in convolutional deep learning models

URL: http://arxiv.org/abs/2510.00664v1
Date: Wed, 01 Oct 2025 08:47:00 GMT
Title: Batch-CAM: Introduction to better reasoning in convolutional deep learning models
Authors: Giacomo Ignesti, Davide Moroni, Massimo Martinelli,
Abstract summary: Batch-CAM is a novel training paradigm that fuses a batch implementation of the Grad-CAM algorithm with a prototypical reconstruction loss.<n>Our results demonstrate that Batch-CAM achieves a simultaneous improvement in accuracy and image reconstruction quality while reducing training and inference times.
Score: 2.0391237204597363
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Understanding the inner workings of deep learning models is crucial for advancing artificial intelligence, particularly in high-stakes fields such as healthcare, where accurate explanations are as vital as precision. This paper introduces Batch-CAM, a novel training paradigm that fuses a batch implementation of the Grad-CAM algorithm with a prototypical reconstruction loss. This combination guides the model to focus on salient image features, thereby enhancing its performance across classification tasks. Our results demonstrate that Batch-CAM achieves a simultaneous improvement in accuracy and image reconstruction quality while reducing training and inference times. By ensuring models learn from evidence-relevant information,this approach makes a relevant contribution to building more transparent, explainable, and trustworthy AI systems.

Related papers

GloTok: Global Perspective Tokenizer for Image Reconstruction and Generation [51.95701097588426]
We introduce a Global Perspective Tokenizer (GloTok) to model a more uniform semantic distribution of tokenized features.<n>A residual learning module is proposed to recover the fine-grained details to minimize the reconstruction error caused by quantization.<n>Experiments on the standard ImageNet-1k benchmark clearly show that our proposed method achieves state-of-the-art reconstruction performance and generation quality.
arXiv Detail & Related papers (2025-11-18T06:40:26Z)
Foundations and Models in Modern Computer Vision: Key Building Blocks in Landmark Architectures [34.542592986038265]
This report analyzes the evolution of key design patterns in computer vision by examining six influential papers.<n>We review ResNet, which introduced residual connections to overcome the vanishing gradient problem.<n>We examine the Vision Transformer (ViT), which established a new paradigm by applying the Transformer architecture to sequences of image patches.
arXiv Detail & Related papers (2025-07-31T09:08:11Z)
Retrospective Memory for Camouflaged Object Detection [18.604039107883317]
We propose a recall-augmented COD architecture, namely RetroMem, which dynamically modulates camouflage pattern perception and inference.<n>In the recall stage, we propose a dynamic memory mechanism and an inference pattern reconstruction.<n>Our RetroMem significantly outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-06-18T08:22:19Z)
Integrated Image Reconstruction and Target Recognition based on Deep Learning Technique [3.3410072288157155]
We present Att-ClassiGAN, which significantly reduces the reconstruction time compared to traditional CMI approaches.<n>It delivers improved Normalized Mean Squared Error (NMSE), higher Structural Similarity Index (SSIM) and better classification outcomes for the reconstructed targets.
arXiv Detail & Related papers (2025-05-07T22:34:32Z)
Evolved Hierarchical Masking for Self-Supervised Learning [49.77271430882176]
Existing Masked Image Modeling methods apply fixed mask patterns to guide the self-supervised training.<n>This paper introduces an evolved hierarchical masking method to pursue general visual cues modeling in self-supervised learning.
arXiv Detail & Related papers (2025-04-12T09:40:14Z)
Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness. Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings. This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z)
Generative Model-based Feature Knowledge Distillation for Action Recognition [11.31068233536815]
Our paper introduces an innovative knowledge distillation framework, with the generative model for training a lightweight student model. The efficacy of our approach is demonstrated through comprehensive experiments on diverse popular datasets.
arXiv Detail & Related papers (2023-12-14T03:55:29Z)
Dual-Activated Lightweight Attention ResNet50 for Automatic Histopathology Breast Cancer Image Classification [0.0]
This study introduces a novel method for breast cancer classification, the Dual-Activated Lightweight Attention ResNet50 model. It integrates a pre-trained ResNet50 model with a lightweight attention mechanism, embedding an attention module in the fourth layer of ResNet50. The DALAResNet50 method was tested on breast cancer histopathology images from the BreakHis Database across magnification factors of 40X, 100X, 200X, and 400X, achieving accuracies of 98.5%, 98.7%, 97.9%, and 94.3%, respectively.
arXiv Detail & Related papers (2023-08-25T03:08:41Z)
Retrieval-Enhanced Contrastive Vision-Text Models [61.783728119255365]
We propose to equip vision-text models with the ability to refine their embedding with cross-modal retrieved information from a memory at inference time. Remarkably, we show that this can be done with a light-weight, single-layer, fusion transformer on top of a frozen CLIP. Our experiments validate that our retrieval-enhanced contrastive (RECO) training improves CLIP performance substantially on several challenging fine-grained tasks.
arXiv Detail & Related papers (2023-06-12T15:52:02Z)
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task [47.1857510710807]
We present a new learning framework, dubbed GPT4Image, where the knowledge of the large pre-trained models are extracted to help CNNs and ViTs learn better representations.<n>We conduct extensive experiments to verify the effectiveness of the proposed algorithm on various visual perception tasks.
arXiv Detail & Related papers (2023-06-01T14:02:45Z)
Learning Rich Nearest Neighbor Representations from Self-supervised Ensembles [60.97922557957857]
We provide a framework to perform self-supervised model ensembling via a novel method of learning representations directly through gradient descent at inference time. This technique improves representation quality, as measured by k-nearest neighbors, both on the in-domain dataset and in the transfer setting.
arXiv Detail & Related papers (2021-10-19T22:24:57Z)
Who Explains the Explanation? Quantitatively Assessing Feature Attribution Methods [0.0]
We propose a novel evaluation metric -- the Focus -- designed to quantify the faithfulness of explanations. We show the robustness of the metric through randomization experiments, and then use Focus to evaluate and compare three popular explainability techniques. Our results find LRP and GradCAM to be consistent and reliable, while the latter remains most competitive even when applied to poorly performing models.
arXiv Detail & Related papers (2021-09-28T07:10:24Z)
Enhancing Dialogue Generation via Multi-Level Contrastive Learning [57.005432249952406]
We propose a multi-level contrastive learning paradigm to model the fine-grained quality of the responses with respect to the query. A Rank-aware (RC) network is designed to construct the multi-level contrastive optimization objectives. We build a Knowledge Inference (KI) component to capture the keyword knowledge from the reference during training and exploit such information to encourage the generation of informative words.
arXiv Detail & Related papers (2020-09-19T02:41:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.