Related papers: ProtoQuant: Quantization of Prototypical Parts For General and Fine-Grained Image Classification

ProtoQuant: Quantization of Prototypical Parts For General and Fine-Grained Image Classification

URL: http://arxiv.org/abs/2602.06592v1
Date: Fri, 06 Feb 2026 10:41:31 GMT
Title: ProtoQuant: Quantization of Prototypical Parts For General and Fine-Grained Image Classification
Authors: Mikołaj Janusz, Adam Wróbel, Bartosz Zieliński, Dawid Rymarczyk,
Abstract summary: ProtoQuant is a novel architecture that achieves prototype stability and grounded interpretability.<n>By constraining prototypes to a discrete learned codebook within the latent space, we ensure they remain faithful representations of the training data without the need to update the backbone.<n>This design allows ProtoQuant to function as an efficient, interpretable head that scales to large-scale datasets.
Score: 3.4335395164627722
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Prototypical parts-based models offer a "this looks like that" paradigm for intrinsic interpretability, yet they typically struggle with ImageNet-scale generalization and often require computationally expensive backbone finetuning. Furthermore, existing methods frequently suffer from "prototype drift," where learned prototypes lack tangible grounding in the training distribution and change their activation under small perturbations. We present ProtoQuant, a novel architecture that achieves prototype stability and grounded interpretability through latent vector quantization. By constraining prototypes to a discrete learned codebook within the latent space, we ensure they remain faithful representations of the training data without the need to update the backbone. This design allows ProtoQuant to function as an efficient, interpretable head that scales to large-scale datasets. We evaluate ProtoQuant on ImageNet and several fine-grained benchmarks (CUB-200, Cars-196). Our results demonstrate that ProtoQuant achieves competitive classification accuracy while generalizing to ImageNet and comparable interpretability metrics to other prototypical-parts-based methods.

Related papers

Proto-Former: Unified Facial Landmark Detection by Prototype Transformer [77.47431726595111]
Proto-Former is a unified, adaptive, end-to-end facial landmark detection framework.<n>It enables joint training across multiple datasets within a unified architecture.<n>Proto-Former achieves superior performance compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2025-10-17T06:00:25Z)
Interpretable Image Classification with Adaptive Prototype-based Vision Transformers [37.62530032165594]
We present ProtoViT, a method for interpretable image classification combining deep learning and case-based reasoning. Our model integrates Vision Transformer (ViT) backbones into prototype based models, while offering spatially deformed prototypes. Our experiments show that our model can generally achieve higher performance than the existing prototype based models.
arXiv Detail & Related papers (2024-10-28T04:33:28Z)
Multi-Scale Grouped Prototypes for Interpretable Semantic Segmentation [7.372346036256517]
Prototypical part learning is emerging as a promising approach for making semantic segmentation interpretable.<n>We propose a method for interpretable semantic segmentation that leverages multi-scale image representation for prototypical part learning.<n>Experiments conducted on Pascal VOC, Cityscapes, and ADE20K demonstrate that the proposed method increases model sparsity, improves interpretability over existing prototype-based methods, and narrows the performance gap with the non-interpretable counterpart models.
arXiv Detail & Related papers (2024-09-14T17:52:59Z)
Correlation Weighted Prototype-based Self-Supervised One-Shot Segmentation of Medical Images [12.365801596593936]
Medical image segmentation is one of the domains where sufficient annotated data is not available. We propose a prototype-based self-supervised one-way one-shot learning framework using pseudo-labels generated from superpixels. We show that the proposed simple but potent framework performs at par with the state-of-the-art methods.
arXiv Detail & Related papers (2024-08-12T15:38:51Z)
This actually looks like that: Proto-BagNets for local and global interpretability-by-design [5.037593461859481]
Interpretability is a key requirement for the use of machine learning models in high-stakes applications. We introduce Proto-BagNets, an interpretable-by-design prototype-based model. Proto-BagNet provides faithful, accurate, and clinically meaningful local and global explanations.
arXiv Detail & Related papers (2024-06-21T14:12:15Z)
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking. Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z)
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning [47.96387857237473]
We devise a network which can perform attention over activations obtained while processing other training samples. Our memory models the distribution of past keys and values through the definition of prototype vectors. We demonstrate that our proposal can increase the performance of an encoder-decoder Transformer by 3.7 CIDEr points both when training in cross-entropy only and when fine-tuning with self-critical sequence training.
arXiv Detail & Related papers (2023-08-23T18:53:00Z)
Rethinking Semantic Segmentation: A Prototype View [126.59244185849838]
We present a nonparametric semantic segmentation model based on non-learnable prototypes. Our framework yields compelling results over several datasets. We expect this work will provoke a rethink of the current de facto semantic segmentation model design.
arXiv Detail & Related papers (2022-03-28T21:15:32Z)
Interpretable Image Classification with Differentiable Prototypes Assignment [7.660883761395447]
We introduce ProtoPool, an interpretable image classification model with a pool of prototypes shared by the classes. It is obtained by introducing a fully differentiable assignment of prototypes to particular classes. We show that ProtoPool obtains state-of-the-art accuracy on the CUB-200-2011 and the Stanford Cars datasets, substantially reducing the number of prototypes.
arXiv Detail & Related papers (2021-12-06T10:03:32Z)
Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes [7.8515366468594765]
We present a deformable part network (Deformable ProtoPNet) that integrates the power of deep learning and the interpretability of case-based reasoning. This model classifies input images by comparing them with prototypes learned during training, yielding explanations in the form of "this looks like that"
arXiv Detail & Related papers (2021-11-29T22:38:13Z)
Attentional Prototype Inference for Few-Shot Segmentation [128.45753577331422]
We propose attentional prototype inference (API), a probabilistic latent variable framework for few-shot segmentation. We define a global latent variable to represent the prototype of each object category, which we model as a probabilistic distribution. We conduct extensive experiments on four benchmarks, where our proposal obtains at least competitive and often better performance than state-of-the-art prototype-based methods.
arXiv Detail & Related papers (2021-05-14T06:58:44Z)
Part-aware Prototype Network for Few-shot Semantic Segmentation [50.581647306020095]
We propose a novel few-shot semantic segmentation framework based on the prototype representation. Our key idea is to decompose the holistic class representation into a set of part-aware prototypes. We develop a novel graph neural network model to generate and enhance the proposed part-aware prototypes.
arXiv Detail & Related papers (2020-07-13T11:03:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.