Related papers: ProtoPFormer: Concentrating on Prototypical Parts in Vision Transformers for Interpretable Image Recognition

ProtoPFormer: Concentrating on Prototypical Parts in Vision Transformers for Interpretable Image Recognition

URL: http://arxiv.org/abs/2208.10431v1
Date: Mon, 22 Aug 2022 16:36:32 GMT
Title: ProtoPFormer: Concentrating on Prototypical Parts in Vision Transformers for Interpretable Image Recognition
Authors: Mengqi Xue, Qihan Huang, Haofei Zhang, Lechao Cheng, Jie Song, Minghui Wu, Mingli Song
Abstract summary: Prototypical part network (ProtoPNet) has drawn wide attention and boosted many follow-up studies due to its self-explanatory property for explainable artificial intelligence (XAI) When directly applying ProtoPNet on vision transformer (ViT) backbones, learned prototypes have a relatively high probability of being activated by the background and pay less attention to the foreground. This paper proposes prototypical part transformer (ProtoPFormer) for appropriately and effectively applying the prototype-based method with ViTs for interpretable image recognition.
Score: 32.34322644235324
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prototypical part network (ProtoPNet) has drawn wide attention and boosted many follow-up studies due to its self-explanatory property for explainable artificial intelligence (XAI). However, when directly applying ProtoPNet on vision transformer (ViT) backbones, learned prototypes have a ''distraction'' problem: they have a relatively high probability of being activated by the background and pay less attention to the foreground. The powerful capability of modeling long-term dependency makes the transformer-based ProtoPNet hard to focus on prototypical parts, thus severely impairing its inherent interpretability. This paper proposes prototypical part transformer (ProtoPFormer) for appropriately and effectively applying the prototype-based method with ViTs for interpretable image recognition. The proposed method introduces global and local prototypes for capturing and highlighting the representative holistic and partial features of targets according to the architectural characteristics of ViTs. The global prototypes are adopted to provide the global view of objects to guide local prototypes to concentrate on the foreground while eliminating the influence of the background. Afterwards, local prototypes are explicitly supervised to concentrate on their respective prototypical visual parts, increasing the overall interpretability. Extensive experiments demonstrate that our proposed global and local prototypes can mutually correct each other and jointly make final decisions, which faithfully and transparently reason the decision-making processes associatively from the whole and local perspectives, respectively. Moreover, ProtoPFormer consistently achieves superior performance and visualization results over the state-of-the-art (SOTA) prototype-based baselines. Our code has been released at https://github.com/zju-vipa/ProtoPFormer.

Related papers

Distilled Transformers with Locally Enhanced Global Representations for Face Forgery Detection [48.263655122968906]
Face forgery detection (FFD) is devoted to detecting the authenticity of face images. We propose a distilled transformer network (DTN) to capture both rich local and global forgery traces.
arXiv Detail & Related papers (2024-12-28T14:00:27Z)
Mind the Gap Between Prototypes and Images in Cross-domain Finetuning [64.97317635355124]
We propose a contrastive prototype-image adaptation (CoPA) to adapt different transformations respectively for prototypes and images. Experiments on Meta-Dataset demonstrate that CoPA achieves the state-of-the-art performance more efficiently.
arXiv Detail & Related papers (2024-10-16T11:42:11Z)
Enhanced Prototypical Part Network (EPPNet) For Explainable Image Classification Via Prototypes [16.528373143163275]
We introduce the Enhanced Prototypical Part Network (EPPNet) for image classification. EPPNet achieves strong performance while discovering relevant prototypes that can be used to explain the classification results. Our evaluations on the CUB-200-2011 dataset show that the EPPNet outperforms state-of-the-art xAI-based methods.
arXiv Detail & Related papers (2024-08-08T17:26:56Z)
Query-guided Prototype Evolution Network for Few-Shot Segmentation [85.75516116674771]
We present a new method that integrates query features into the generation process of foreground and background prototypes. Experimental results on the PASCAL-$5i$ and mirroring-$20i$ datasets attest to the substantial enhancements achieved by QPENet.
arXiv Detail & Related papers (2024-03-11T07:50:40Z)
ProtoP-OD: Explainable Object Detection with Prototypical Parts [0.0]
This paper introduces an extension to detection transformers that constructs prototypical local features and uses them in object detection. The proposed extension consists of a bottleneck module, the prototype neck, that computes a discretized representation of prototype activations.
arXiv Detail & Related papers (2024-02-29T13:25:15Z)
Mixture of Gaussian-distributed Prototypes with Generative Modelling for Interpretable and Trustworthy Image Recognition [15.685927265270085]
We present a new generative paradigm to learn prototype distributions, termed as Mixture of Gaussian-distributed Prototypes (MGProto) MGProto achieves state-of-the-art image recognition and OoD detection performances, while providing encouraging interpretability results.
arXiv Detail & Related papers (2023-11-30T11:01:37Z)
ProtoArgNet: Interpretable Image Classification with Super-Prototypes and Argumentation [Technical Report] [17.223442899324482]
ProtoArgNet is a novel interpretable deep neural architecture for image classification in the spirit of prototypical-part-learning. ProtoArgNet uses super-prototypes that combine prototypical-parts into a unified class representation. We demonstrate on several datasets that ProtoArgNet outperforms state-of-the-art prototypical-part-learning approaches.
arXiv Detail & Related papers (2023-11-26T21:52:47Z)
Pixel-Grounded Prototypical Part Networks [33.408034817820834]
Prototypical part neural networks (ProtoPartNNs) are an intrinsically interpretable approach to machine learning. We argue that detraction from these underlying issues is due to the alluring nature of visualizations and an over-reliance on intuition. We propose new receptive field-based architectural constraints for meaningful localization and a principled pixel space mapping for ProtoPartNNs.
arXiv Detail & Related papers (2023-09-25T21:09:49Z)
Holistic Prototype Attention Network for Few-Shot VOS [74.25124421163542]
Few-shot video object segmentation (FSVOS) aims to segment dynamic objects of unseen classes by resorting to a small set of support images. We propose a holistic prototype attention network (HPAN) for advancing FSVOS.
arXiv Detail & Related papers (2023-07-16T03:48:57Z)
PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result. Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z)
Attentional Prototype Inference for Few-Shot Segmentation [128.45753577331422]
We propose attentional prototype inference (API), a probabilistic latent variable framework for few-shot segmentation. We define a global latent variable to represent the prototype of each object category, which we model as a probabilistic distribution. We conduct extensive experiments on four benchmarks, where our proposal obtains at least competitive and often better performance than state-of-the-art prototype-based methods.
arXiv Detail & Related papers (2021-05-14T06:58:44Z)
Conformer: Local Features Coupling Global Representations for Visual Recognition [72.9550481476101]
We propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning. Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet.
arXiv Detail & Related papers (2021-05-09T10:00:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.