ProtoP-OD: Explainable Object Detection with Prototypical Parts
- URL: http://arxiv.org/abs/2402.19142v1
- Date: Thu, 29 Feb 2024 13:25:15 GMT
- Title: ProtoP-OD: Explainable Object Detection with Prototypical Parts
- Authors: Pavlos Rath-Manakidis, Frederik Strothmann, Tobias Glasmachers,
Laurenz Wiskott
- Abstract summary: This paper introduces an extension to detection transformers that constructs prototypical local features and uses them in object detection.
The proposed extension consists of a bottleneck module, the prototype neck, that computes a discretized representation of prototype activations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interpretation and visualization of the behavior of detection transformers
tends to highlight the locations in the image that the model attends to, but it
provides limited insight into the \emph{semantics} that the model is focusing
on. This paper introduces an extension to detection transformers that
constructs prototypical local features and uses them in object detection. These
custom features, which we call prototypical parts, are designed to be mutually
exclusive and align with the classifications of the model. The proposed
extension consists of a bottleneck module, the prototype neck, that computes a
discretized representation of prototype activations and a new loss term that
matches prototypes to object classes. This setup leads to interpretable
representations in the prototype neck, allowing visual inspection of the image
content perceived by the model and a better understanding of the model's
reliability. We show experimentally that our method incurs only a limited
performance penalty, and we provide examples that demonstrate the quality of
the explanations provided by our method, which we argue outweighs the
performance penalty.
Related papers
- Interpretable Image Classification with Adaptive Prototype-based Vision Transformers [37.62530032165594]
We present ProtoViT, a method for interpretable image classification combining deep learning and case-based reasoning.
Our model integrates Vision Transformer (ViT) backbones into prototype based models, while offering spatially deformed prototypes.
Our experiments show that our model can generally achieve higher performance than the existing prototype based models.
arXiv Detail & Related papers (2024-10-28T04:33:28Z) - Multi-Scale Grouped Prototypes for Interpretable Semantic Segmentation [7.372346036256517]
Prototypical part learning is emerging as a promising approach for making semantic segmentation interpretable.
We propose a method for interpretable semantic segmentation that leverages multi-scale image representation for prototypical part learning.
Experiments conducted on Pascal VOC, Cityscapes, and ADE20K demonstrate that the proposed method increases model sparsity, improves interpretability over existing prototype-based methods, and narrows the performance gap with the non-interpretable counterpart models.
arXiv Detail & Related papers (2024-09-14T17:52:59Z) - Mixture of Gaussian-distributed Prototypes with Generative Modelling for Interpretable and Trustworthy Image Recognition [15.685927265270085]
We present a new generative paradigm to learn prototype distributions, termed as Mixture of Gaussian-distributed Prototypes (MGProto)
MGProto achieves state-of-the-art image recognition and OoD detection performances, while providing encouraging interpretability results.
arXiv Detail & Related papers (2023-11-30T11:01:37Z) - With a Little Help from your own Past: Prototypical Memory Networks for
Image Captioning [47.96387857237473]
We devise a network which can perform attention over activations obtained while processing other training samples.
Our memory models the distribution of past keys and values through the definition of prototype vectors.
We demonstrate that our proposal can increase the performance of an encoder-decoder Transformer by 3.7 CIDEr points both when training in cross-entropy only and when fine-tuning with self-critical sequence training.
arXiv Detail & Related papers (2023-08-23T18:53:00Z) - ProtoSeg: Interpretable Semantic Segmentation with Prototypical Parts [12.959270094693254]
We introduce ProtoSeg, a novel model for interpretable semantic image segmentation.
To achieve accuracy comparable to baseline methods, we adapt the mechanism of prototypical parts.
We show that ProtoSeg discovers semantic concepts, in contrast to standard segmentation models.
arXiv Detail & Related papers (2023-01-28T19:14:32Z) - ContraFeat: Contrasting Deep Features for Semantic Discovery [102.4163768995288]
StyleGAN has shown strong potential for disentangled semantic control.
Existing semantic discovery methods on StyleGAN rely on manual selection of modified latent layers to obtain satisfactory manipulation results.
We propose a model that automates this process and achieves state-of-the-art semantic discovery performance.
arXiv Detail & Related papers (2022-12-14T15:22:13Z) - Object-centric and memory-guided normality reconstruction for video
anomaly detection [56.64792194894702]
This paper addresses anomaly detection problem for videosurveillance.
Due to the inherent rarity and heterogeneity of abnormal events, the problem is viewed as a normality modeling strategy.
Our model learns object-centric normal patterns without seeing anomalous samples during training.
arXiv Detail & Related papers (2022-03-07T19:28:39Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - Detection and Captioning with Unseen Object Classes [12.894104422808242]
Test images may contain visual objects with no corresponding visual or textual training examples.
We propose a detection-driven approach based on a generalized zero-shot detection model and a template-based sentence generation model.
Our experiments show that the proposed zero-shot detection model obtains state-of-the-art performance on the MS-COCO dataset.
arXiv Detail & Related papers (2021-08-13T10:43:20Z) - Attentional Prototype Inference for Few-Shot Segmentation [128.45753577331422]
We propose attentional prototype inference (API), a probabilistic latent variable framework for few-shot segmentation.
We define a global latent variable to represent the prototype of each object category, which we model as a probabilistic distribution.
We conduct extensive experiments on four benchmarks, where our proposal obtains at least competitive and often better performance than state-of-the-art prototype-based methods.
arXiv Detail & Related papers (2021-05-14T06:58:44Z) - Part-aware Prototype Network for Few-shot Semantic Segmentation [50.581647306020095]
We propose a novel few-shot semantic segmentation framework based on the prototype representation.
Our key idea is to decompose the holistic class representation into a set of part-aware prototypes.
We develop a novel graph neural network model to generate and enhance the proposed part-aware prototypes.
arXiv Detail & Related papers (2020-07-13T11:03:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.