Quantifying Learnability and Describability of Visual Concepts Emerging
in Representation Learning
- URL: http://arxiv.org/abs/2010.14551v1
- Date: Tue, 27 Oct 2020 18:41:49 GMT
- Title: Quantifying Learnability and Describability of Visual Concepts Emerging
in Representation Learning
- Authors: Iro Laina, Ruth C. Fong, Andrea Vedaldi
- Abstract summary: We consider how to characterise visual groupings discovered automatically by deep neural networks.
We introduce two concepts, visual learnability and describability, that can be used to quantify the interpretability of arbitrary image groupings.
- Score: 91.58529629419135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increasing impact of black box models, and particularly of unsupervised
ones, comes with an increasing interest in tools to understand and interpret
them. In this paper, we consider in particular how to characterise visual
groupings discovered automatically by deep neural networks, starting with
state-of-the-art clustering methods. In some cases, clusters readily correspond
to an existing labelled dataset. However, often they do not, yet they still
maintain an "intuitive interpretability". We introduce two concepts, visual
learnability and describability, that can be used to quantify the
interpretability of arbitrary image groupings, including unsupervised ones. The
idea is to measure (1) how well humans can learn to reproduce a grouping by
measuring their ability to generalise from a small set of visual examples
(learnability) and (2) whether the set of visual examples can be replaced by a
succinct, textual description (describability). By assessing human annotators
as classifiers, we remove the subjective quality of existing evaluation
metrics. For better scalability, we finally propose a class-level captioning
system to generate descriptions for visual groupings automatically and compare
it to human annotators using the describability metric.
Related papers
- Perceptual Group Tokenizer: Building Perception with Iterative Grouping [14.760204235027627]
We propose the Perceptual Group Tokenizer, a model that relies on grouping operations to extract visual features and perform self-supervised representation learning.
We show that the proposed model can achieve competitive computation performance compared to state-of-the-art vision architectures.
arXiv Detail & Related papers (2023-11-30T07:00:14Z) - Representing visual classification as a linear combination of words [0.0]
We present an explainability strategy that uses a vision-language model to identify language-based descriptors of a visual classification task.
By leveraging a pre-trained joint embedding space between images and text, our approach estimates a new classification task as a linear combination of words.
We find that the resulting descriptors largely align with clinical knowledge despite a lack of domain-specific language training.
arXiv Detail & Related papers (2023-11-18T02:00:20Z) - Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models [64.24227572048075]
We propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models.
Our approach takes inspiration from human intelligence in which external knowledge is usually incorporated into recognizing novel categories of objects.
arXiv Detail & Related papers (2023-08-22T04:24:45Z) - Measuring the Interpretability of Unsupervised Representations via
Quantized Reverse Probing [97.70862116338554]
We investigate the problem of measuring interpretability of self-supervised representations.
We formulate the latter as estimating the mutual information between the representation and a space of manually labelled concepts.
We use our method to evaluate a large number of self-supervised representations, ranking them by interpretability.
arXiv Detail & Related papers (2022-09-07T16:18:50Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - Visual Superordinate Abstraction for Robust Concept Learning [80.15940996821541]
Concept learning constructs visual representations that are connected to linguistic semantics.
We ascribe the bottleneck to a failure of exploring the intrinsic semantic hierarchy of visual concepts.
We propose a visual superordinate abstraction framework for explicitly modeling semantic-aware visual subspaces.
arXiv Detail & Related papers (2022-05-28T14:27:38Z) - Accessible Visualization via Natural Language Descriptions: A Four-Level
Model of Semantic Content [6.434361163743876]
We introduce a conceptual model for the semantic content conveyed by natural language descriptions of visualizations.
We conduct a mixed-methods evaluation with 30 blind and 90 sighted readers, and find that these reader groups differ significantly on which semantic content they rank as most useful.
arXiv Detail & Related papers (2021-10-08T23:37:25Z) - Learning and Evaluating Representations for Deep One-class
Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification.
We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations.
In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.