ProtoPFormer: Concentrating on Prototypical Parts in Vision Transformers
for Interpretable Image Recognition
- URL: http://arxiv.org/abs/2208.10431v1
- Date: Mon, 22 Aug 2022 16:36:32 GMT
- Title: ProtoPFormer: Concentrating on Prototypical Parts in Vision Transformers
for Interpretable Image Recognition
- Authors: Mengqi Xue, Qihan Huang, Haofei Zhang, Lechao Cheng, Jie Song, Minghui
Wu, Mingli Song
- Abstract summary: Prototypical part network (ProtoPNet) has drawn wide attention and boosted many follow-up studies due to its self-explanatory property for explainable artificial intelligence (XAI)
When directly applying ProtoPNet on vision transformer (ViT) backbones, learned prototypes have a relatively high probability of being activated by the background and pay less attention to the foreground.
This paper proposes prototypical part transformer (ProtoPFormer) for appropriately and effectively applying the prototype-based method with ViTs for interpretable image recognition.
- Score: 32.34322644235324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prototypical part network (ProtoPNet) has drawn wide attention and boosted
many follow-up studies due to its self-explanatory property for explainable
artificial intelligence (XAI). However, when directly applying ProtoPNet on
vision transformer (ViT) backbones, learned prototypes have a ''distraction''
problem: they have a relatively high probability of being activated by the
background and pay less attention to the foreground. The powerful capability of
modeling long-term dependency makes the transformer-based ProtoPNet hard to
focus on prototypical parts, thus severely impairing its inherent
interpretability. This paper proposes prototypical part transformer
(ProtoPFormer) for appropriately and effectively applying the prototype-based
method with ViTs for interpretable image recognition. The proposed method
introduces global and local prototypes for capturing and highlighting the
representative holistic and partial features of targets according to the
architectural characteristics of ViTs. The global prototypes are adopted to
provide the global view of objects to guide local prototypes to concentrate on
the foreground while eliminating the influence of the background. Afterwards,
local prototypes are explicitly supervised to concentrate on their respective
prototypical visual parts, increasing the overall interpretability. Extensive
experiments demonstrate that our proposed global and local prototypes can
mutually correct each other and jointly make final decisions, which faithfully
and transparently reason the decision-making processes associatively from the
whole and local perspectives, respectively. Moreover, ProtoPFormer consistently
achieves superior performance and visualization results over the
state-of-the-art (SOTA) prototype-based baselines. Our code has been released
at https://github.com/zju-vipa/ProtoPFormer.
Related papers
- Dual-Modal Prototype Joint Learning for Compositional Zero-Shot Learning [15.183106475115583]
Compositional Zero-Shot Learning (CZSL) aims to recognize novel compositions of attributes and objects by leveraging knowledge learned from seen compositions.
We propose a novel Dual-Modal Prototype Joint Learning framework for the CZSL task.
arXiv Detail & Related papers (2025-01-23T17:30:27Z) - Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection [48.263655122968906]
Face forgery detection (FFD) is devoted to detecting the authenticity of face images.
We propose a distilled transformer network (DTN) to capture both rich local and global forgery traces.
arXiv Detail & Related papers (2024-12-28T14:00:27Z) - Mind the Gap Between Prototypes and Images in Cross-domain Finetuning [64.97317635355124]
We propose a contrastive prototype-image adaptation (CoPA) to adapt different transformations respectively for prototypes and images.
Experiments on Meta-Dataset demonstrate that CoPA achieves the state-of-the-art performance more efficiently.
arXiv Detail & Related papers (2024-10-16T11:42:11Z) - Query-guided Prototype Evolution Network for Few-Shot Segmentation [85.75516116674771]
We present a new method that integrates query features into the generation process of foreground and background prototypes.
Experimental results on the PASCAL-$5i$ and mirroring-$20i$ datasets attest to the substantial enhancements achieved by QPENet.
arXiv Detail & Related papers (2024-03-11T07:50:40Z) - ProtoP-OD: Explainable Object Detection with Prototypical Parts [0.0]
This paper introduces an extension to detection transformers that constructs prototypical local features and uses them in object detection.
The proposed extension consists of a bottleneck module, the prototype neck, that computes a discretized representation of prototype activations.
arXiv Detail & Related papers (2024-02-29T13:25:15Z) - Mixture of Gaussian-distributed Prototypes with Generative Modelling for Interpretable and Trustworthy Image Recognition [15.685927265270085]
We present a new generative paradigm to learn prototype distributions, termed as Mixture of Gaussian-distributed Prototypes (MGProto)
MGProto achieves state-of-the-art image recognition and OoD detection performances, while providing encouraging interpretability results.
arXiv Detail & Related papers (2023-11-30T11:01:37Z) - ProtoArgNet: Interpretable Image Classification with Super-Prototypes and Argumentation [Technical Report] [17.223442899324482]
ProtoArgNet is a novel interpretable deep neural architecture for image classification in the spirit of prototypical-part-learning.
ProtoArgNet uses super-prototypes that combine prototypical-parts into a unified class representation.
We demonstrate on several datasets that ProtoArgNet outperforms state-of-the-art prototypical-part-learning approaches.
arXiv Detail & Related papers (2023-11-26T21:52:47Z) - Pixel-Grounded Prototypical Part Networks [33.408034817820834]
Prototypical part neural networks (ProtoPartNNs) are an intrinsically interpretable approach to machine learning.
We argue that detraction from these underlying issues is due to the alluring nature of visualizations and an over-reliance on intuition.
We propose new receptive field-based architectural constraints for meaningful localization and a principled pixel space mapping for ProtoPartNNs.
arXiv Detail & Related papers (2023-09-25T21:09:49Z) - Holistic Prototype Attention Network for Few-Shot VOS [74.25124421163542]
Few-shot video object segmentation (FSVOS) aims to segment dynamic objects of unseen classes by resorting to a small set of support images.
We propose a holistic prototype attention network (HPAN) for advancing FSVOS.
arXiv Detail & Related papers (2023-07-16T03:48:57Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - Attentional Prototype Inference for Few-Shot Segmentation [128.45753577331422]
We propose attentional prototype inference (API), a probabilistic latent variable framework for few-shot segmentation.
We define a global latent variable to represent the prototype of each object category, which we model as a probabilistic distribution.
We conduct extensive experiments on four benchmarks, where our proposal obtains at least competitive and often better performance than state-of-the-art prototype-based methods.
arXiv Detail & Related papers (2021-05-14T06:58:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.