Visual Probing: Cognitive Framework for Explaining Self-Supervised Image
Representations
- URL: http://arxiv.org/abs/2106.11054v1
- Date: Mon, 21 Jun 2021 12:40:31 GMT
- Title: Visual Probing: Cognitive Framework for Explaining Self-Supervised Image
Representations
- Authors: Witold Oleszkiewicz, Dominika Basaj, Igor Sieradzki, Micha{\l}
G\'orszczak, Barbara Rychalska, Koryna Lewandowska, Tomasz Trzci\'nski,
Bartosz Zieli\'nski
- Abstract summary: Recently introduced self-supervised methods for image representation learning provide on par or superior results to their fully supervised competitors.
Motivated by this observation, we introduce a novel visual probing framework for explaining the self-supervised models.
We show the effectiveness and applicability of those analogs in the context of explaining self-supervised representations.
- Score: 12.485001250777248
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recently introduced self-supervised methods for image representation learning
provide on par or superior results to their fully supervised competitors, yet
the corresponding efforts to explain the self-supervised approaches lag behind.
Motivated by this observation, we introduce a novel visual probing framework
for explaining the self-supervised models by leveraging probing tasks employed
previously in natural language processing. The probing tasks require knowledge
about semantic relationships between image parts. Hence, we propose a
systematic approach to obtain analogs of natural language in vision, such as
visual words, context, and taxonomy. Our proposal is grounded in Marr's
computational theory of vision and concerns features like textures, shapes, and
lines. We show the effectiveness and applicability of those analogs in the
context of explaining self-supervised representations. Our key findings
emphasize that relations between language and vision can serve as an effective
yet intuitive tool for discovering how machine learning models work,
independently of data modality. Our work opens a plethora of research pathways
towards more explainable and transparent AI.
Related papers
- Self-Explainable Affordance Learning with Embodied Caption [63.88435741872204]
We introduce Self-Explainable Affordance learning (SEA) with embodied caption.
SEA enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning.
We propose a novel model to effectively combine affordance grounding with self-explanation in a simple but efficient manner.
arXiv Detail & Related papers (2024-04-08T15:22:38Z) - Revisiting Self-supervised Learning of Speech Representation from a
Mutual Information Perspective [68.20531518525273]
We take a closer look into existing self-supervised methods of speech from an information-theoretic perspective.
We use linear probes to estimate the mutual information between the target information and learned representations.
We explore the potential of evaluating representations in a self-supervised fashion, where we estimate the mutual information between different parts of the data without using any labels.
arXiv Detail & Related papers (2024-01-16T21:13:22Z) - Localization vs. Semantics: Visual Representations in Unimodal and
Multimodal Models [57.08925810659545]
We conduct a comparative analysis of the visual representations in existing vision-and-language models and vision-only models.
Our empirical observations suggest that vision-and-language models are better at label prediction tasks.
We hope our study sheds light on the role of language in visual learning, and serves as an empirical guide for various pretrained models.
arXiv Detail & Related papers (2022-12-01T05:00:18Z) - Perceptual Grouping in Contrastive Vision-Language Models [59.1542019031645]
We show how vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.
We propose a minimal set of modifications that results in models that uniquely learn both semantic and spatial information.
arXiv Detail & Related papers (2022-10-18T17:01:35Z) - Cross-Modal Alignment Learning of Vision-Language Conceptual Systems [24.423011687551433]
We propose methods for learning aligned vision-language conceptual systems inspired by infants' word learning mechanisms.
The proposed model learns the associations of visual objects and words online and gradually constructs cross-modal relational graph networks.
arXiv Detail & Related papers (2022-07-31T08:39:53Z) - K-LITE: Learning Transferable Visual Models with External Knowledge [242.3887854728843]
K-LITE (Knowledge-augmented Language-Image Training and Evaluation) is a strategy to leverage external knowledge to build transferable visual systems.
In training, it enriches entities in natural language with WordNet and Wiktionary knowledge.
In evaluation, the natural language is also augmented with external knowledge and then used to reference learned visual concepts.
arXiv Detail & Related papers (2022-04-20T04:47:01Z) - Building a visual semantics aware object hierarchy [0.0]
We propose a novel unsupervised method to build visual semantics aware object hierarchy.
Our intuition in this paper comes from real-world knowledge representation where concepts are hierarchically organized.
The evaluation consists of two parts, firstly we apply the constructed hierarchy on the object recognition task and then we compare our visual hierarchy and existing lexical hierarchies to show the validity of our method.
arXiv Detail & Related papers (2022-02-26T00:10:21Z) - Contrastive Representation Learning: A Framework and Review [2.7393821783237184]
The origins of Contrastive Learning date as far back as the 1990s and its development has spanned across many fields.
We propose a general Contrastive Representation Learning framework that simplifies and unifies many different contrastive learning methods.
Examples of how contrastive learning has been applied in computer vision, natural language processing, audio processing, and others, as well as in Reinforcement Learning are also presented.
arXiv Detail & Related papers (2020-10-10T22:46:25Z) - Self-supervised Learning from a Multi-view Perspective [121.63655399591681]
We show that self-supervised representations can extract task-relevant information and discard task-irrelevant information.
Our theoretical framework paves the way to a larger space of self-supervised learning objective design.
arXiv Detail & Related papers (2020-06-10T00:21:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.