Explaining Representation Learning with Perceptual Components
- URL: http://arxiv.org/abs/2406.06930v1
- Date: Tue, 11 Jun 2024 04:08:37 GMT
- Title: Explaining Representation Learning with Perceptual Components
- Authors: Yavuz Yarici, Kiran Kokilepersaud, Mohit Prabhushankar, Ghassan AlRegib,
- Abstract summary: Self-supervised models create representation spaces that lack clear semantic meaning.
We introduce a novel method to analyze representation spaces using three key perceptual components: color, shape, and texture.
Our approach enhances the interpretability of the representation space, offering explanations that resonate with human visual perception.
- Score: 14.10876324116018
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised models create representation spaces that lack clear semantic meaning. This interpretability problem of representations makes traditional explainability methods ineffective in this context. In this paper, we introduce a novel method to analyze representation spaces using three key perceptual components: color, shape, and texture. We employ selective masking of these components to observe changes in representations, resulting in distinct importance maps for each. In scenarios, where labels are absent, these importance maps provide more intuitive explanations as they are integral to the human visual system. Our approach enhances the interpretability of the representation space, offering explanations that resonate with human visual perception. We analyze how different training objectives create distinct representation spaces using perceptual components. Additionally, we examine the representation of images across diverse image domains, providing insights into the role of these components in different contexts.
Related papers
- When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - Parts of Speech-Grounded Subspaces in Vision-Language Models [32.497303059356334]
We propose to separate representations of different visual modalities in CLIP's joint vision-language space.
We learn subspaces capturing variability corresponding to a specific part of speech, while minimising variability to the rest.
We show the proposed model additionally facilitates learning subspaces corresponding to specific visual appearances.
arXiv Detail & Related papers (2023-05-23T13:32:19Z) - Discrete and continuous representations and processing in deep learning:
Looking forward [18.28761409764605]
We argue that combining discrete and continuous representations and their processing will be essential to build systems that exhibit a general form of intelligence.
We suggest and discuss several avenues that could improve current neural networks with the inclusion of discrete elements to combine the advantages of both types of representations.
arXiv Detail & Related papers (2022-01-04T16:30:18Z) - Quantitative analysis of visual representation of sign elements in
COVID-19 context [2.9409535911474967]
We propose using computer analysis to perform a quantitative analysis of the elements used in the visual creations produced in reference to the epidemic.
The images compiled in The Covid Art Museum's Instagram account to analyze the different elements used to represent subjective experiences with regard to a global event.
This research reveals that the elements that are repeated in images to create narratives and the relations of association that are established in the sample.
arXiv Detail & Related papers (2021-12-15T15:54:53Z) - Toward a Visual Concept Vocabulary for GAN Latent Space [74.12447538049537]
This paper introduces a new method for building open-ended vocabularies of primitive visual concepts represented in a GAN's latent space.
Our approach is built from three components: automatic identification of perceptually salient directions based on their layer selectivity; human annotation of these directions with free-form, compositional natural language descriptions.
Experiments show that concepts learned with our approach are reliable and composable -- generalizing across classes, contexts, and observers.
arXiv Detail & Related papers (2021-10-08T17:58:19Z) - Constellation: Learning relational abstractions over objects for
compositional imagination [64.99658940906917]
We introduce Constellation, a network that learns relational abstractions of static visual scenes.
This work is a first step in the explicit representation of visual relationships and using them for complex cognitive procedures.
arXiv Detail & Related papers (2021-07-23T11:59:40Z) - The Geometry of Distributed Representations for Better Alignment,
Attenuated Bias, and Improved Interpretability [9.215513608145994]
High-dimensional representations for words, text, images, knowledge graphs and other structured data are commonly used in machine learning and data mining.
These representations have different degrees of interpretability, with efficient distributed representations coming at the cost of the loss of feature to dimension mapping.
Its effects are seen in many representations and tasks, one particularly problematic one being in language representations where the societal biases, learned from underlying data, are captured and occluded in unknown dimensions and subspaces.
This work addresses some of these problems pertaining to the transparency and interpretability of such representations.
arXiv Detail & Related papers (2020-11-25T01:04:11Z) - Probing Contextual Language Models for Common Ground with Visual
Representations [76.05769268286038]
We design a probing model that evaluates how effective are text-only representations in distinguishing between matching and non-matching visual representations.
Our findings show that language representations alone provide a strong signal for retrieving image patches from the correct object categories.
Visually grounded language models slightly outperform text-only language models in instance retrieval, but greatly under-perform humans.
arXiv Detail & Related papers (2020-05-01T21:28:28Z) - Survey on Visual Sentiment Analysis [87.20223213370004]
This paper reviews pertinent publications and tries to present an exhaustive overview of the field of Visual Sentiment Analysis.
The paper also describes principles of design of general Visual Sentiment Analysis systems from three main points of view.
A formalization of the problem is discussed, considering different levels of granularity, as well as the components that can affect the sentiment toward an image in different ways.
arXiv Detail & Related papers (2020-04-24T10:15:22Z) - Fairness by Learning Orthogonal Disentangled Representations [50.82638766862974]
We propose a novel disentanglement approach to invariant representation problem.
We enforce the meaningful representation to be agnostic to sensitive information by entropy.
The proposed approach is evaluated on five publicly available datasets.
arXiv Detail & Related papers (2020-03-12T11:09:15Z) - Incorporating Visual Semantics into Sentence Representations within a
Grounded Space [20.784771968813747]
We propose to transfer visual information to textual representations by learning an intermediate representation space: the grounded space.
We show that this model outperforms the previous state-of-the-art on classification and semantic relatedness tasks.
arXiv Detail & Related papers (2020-02-07T12:26:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.