HOLMES: HOLonym-MEronym based Semantic inspection for Convolutional
Image Classifiers
- URL: http://arxiv.org/abs/2403.08536v1
- Date: Wed, 13 Mar 2024 13:51:02 GMT
- Title: HOLMES: HOLonym-MEronym based Semantic inspection for Convolutional
Image Classifiers
- Authors: Francesco Dibitonto, Fabio Garcea, Andr\'e Panisson, Alan Perotti, and
Lia Morra
- Abstract summary: We propose a new technique that decomposes a label into a set of related concepts.
HOLMES provides component-level explanations for an image classification.
- Score: 1.6252896527001481
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional Neural Networks (CNNs) are nowadays the model of choice in
Computer Vision, thanks to their ability to automatize the feature extraction
process in visual tasks. However, the knowledge acquired during training is
fully subsymbolic, and hence difficult to understand and explain to end users.
In this paper, we propose a new technique called HOLMES (HOLonym-MEronym based
Semantic inspection) that decomposes a label into a set of related concepts,
and provides component-level explanations for an image classification model.
Specifically, HOLMES leverages ontologies, web scraping and transfer learning
to automatically construct meronym (parts)-based detectors for a given holonym
(class). Then, it produces heatmaps at the meronym level and finally, by
probing the holonym CNN with occluded images, it highlights the importance of
each part on the classification output. Compared to state-of-the-art saliency
methods, HOLMES takes a step further and provides information about both where
and what the holonym CNN is looking at, without relying on densely annotated
datasets and without forcing concepts to be associated to single computational
units. Extensive experimental evaluation on different categories of objects
(animals, tools and vehicles) shows the feasibility of our approach. On
average, HOLMES explanations include at least two meronyms, and the ablation of
a single meronym roughly halves the holonym model confidence. The resulting
heatmaps were quantitatively evaluated using the
deletion/insertion/preservation curves. All metrics were comparable to those
achieved by GradCAM, while offering the advantage of further decomposing the
heatmap in human-understandable concepts, thus highlighting both the relevance
of meronyms to object classification, as well as HOLMES ability to capture it.
The code is available at https://github.com/FrancesC0de/HOLMES.
Related papers
- Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery [52.498055901649025]
Concept Bottleneck Models (CBMs) have been proposed to address the 'black-box' problem of deep neural networks.
We propose a novel CBM approach -- called Discover-then-Name-CBM (DN-CBM) -- that inverts the typical paradigm.
Our concept extraction strategy is efficient, since it is agnostic to the downstream task, and uses concepts already known to the model.
arXiv Detail & Related papers (2024-07-19T17:50:11Z) - Mining Open Semantics from CLIP: A Relation Transition Perspective for Few-Shot Learning [46.25534556546322]
We propose to mine open semantics as anchors to perform a relation transition from image-anchor relationship to image-target relationship to make predictions.
Our method performs favorably against previous state-of-the-arts considering few-shot classification settings.
arXiv Detail & Related papers (2024-06-17T06:28:58Z) - Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification [5.087579454836169]
State-of-the-art explainability methods generate saliency maps to show where a specific class is identified.
We introduce a post-hoc method that explains the entire feature extraction process of a Convolutional Neural Network.
We also show an approach to generate global explanations by aggregating labels across multiple images.
arXiv Detail & Related papers (2024-05-06T09:21:35Z) - KMF: Knowledge-Aware Multi-Faceted Representation Learning for Zero-Shot
Node Classification [75.95647590619929]
Zero-Shot Node Classification (ZNC) has been an emerging and crucial task in graph data analysis.
We propose a Knowledge-Aware Multi-Faceted framework (KMF) that enhances the richness of label semantics.
A novel geometric constraint is developed to alleviate the problem of prototype drift caused by node information aggregation.
arXiv Detail & Related papers (2023-08-15T02:38:08Z) - Mixture of Self-Supervised Learning [2.191505742658975]
Self-supervised learning works by using a pretext task which will be trained on the model before being applied to a specific task.
Previous studies have only used one type of transformation as a pretext task.
This raises the question of how it affects if more than one pretext task is used and to use a gating network to combine all pretext tasks.
arXiv Detail & Related papers (2023-07-27T14:38:32Z) - DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for
Open-world Detection [118.36746273425354]
This paper presents a paralleled visual-concept pre-training method for open-world detection by resorting to knowledge enrichment from a designed concept dictionary.
By enriching the concepts with their descriptions, we explicitly build the relationships among various concepts to facilitate the open-domain learning.
The proposed framework demonstrates strong zero-shot detection performances, e.g., on the LVIS dataset, our DetCLIP-T outperforms GLIP-T by 9.9% mAP and obtains a 13.5% improvement on rare categories.
arXiv Detail & Related papers (2022-09-20T02:01:01Z) - Visual Recognition with Deep Nearest Centroids [57.35144702563746]
We devise deep nearest centroids (DNC), a conceptually elegant yet surprisingly effective network for large-scale visual recognition.
Compared with parametric counterparts, DNC performs better on image classification (CIFAR-10, ImageNet) and greatly boots pixel recognition (ADE20K, Cityscapes)
arXiv Detail & Related papers (2022-09-15T15:47:31Z) - VGSE: Visually-Grounded Semantic Embeddings for Zero-Shot Learning [113.50220968583353]
We propose to discover semantic embeddings containing discriminative visual properties for zero-shot learning.
Our model visually divides a set of images from seen classes into clusters of local image regions according to their visual similarity.
We demonstrate that our visually-grounded semantic embeddings further improve performance over word embeddings across various ZSL models by a large margin.
arXiv Detail & Related papers (2022-03-20T03:49:02Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.