Learning Compositional Representations for Effective Low-Shot
Generalization
- URL: http://arxiv.org/abs/2204.08090v1
- Date: Sun, 17 Apr 2022 21:31:11 GMT
- Title: Learning Compositional Representations for Effective Low-Shot
Generalization
- Authors: Samarth Mishra, Pengkai Zhu, Venkatesh Saligrama
- Abstract summary: We propose Recognition as Part Composition (RPC), an image encoding approach inspired by human cognition.
RPC encodes images by first decomposing them into salient parts, and then encoding each part as a mixture of a small number of prototypes.
We find that this type of learning can overcome hurdles faced by deep convolutional networks in low-shot generalization tasks.
- Score: 45.952867474500145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose Recognition as Part Composition (RPC), an image encoding approach
inspired by human cognition. It is based on the cognitive theory that humans
recognize complex objects by components, and that they build a small compact
vocabulary of concepts to represent each instance with. RPC encodes images by
first decomposing them into salient parts, and then encoding each part as a
mixture of a small number of prototypes, each representing a certain concept.
We find that this type of learning inspired by human cognition can overcome
hurdles faced by deep convolutional networks in low-shot generalization tasks,
like zero-shot learning, few-shot learning and unsupervised domain adaptation.
Furthermore, we find a classifier using an RPC image encoder is fairly robust
to adversarial attacks, that deep neural networks are known to be prone to.
Given that our image encoding principle is based on human cognition, one would
expect the encodings to be interpretable by humans, which we find to be the
case via crowd-sourcing experiments. Finally, we propose an application of
these interpretable encodings in the form of generating synthetic attribute
annotations for evaluating zero-shot learning methods on new datasets.
Related papers
- Saliency Suppressed, Semantics Surfaced: Visual Transformations in Neural Networks and the Brain [0.0]
We take inspiration from neuroscience to shed light on how neural networks encode information at low (visual saliency) and high (semantic similarity) levels of abstraction.
We find that ResNets are more sensitive to saliency information than ViTs, when trained with object classification objectives.
We show that semantic encoding is a key factor in aligning AI with human visual perception, while saliency suppression is a non-brain-like strategy.
arXiv Detail & Related papers (2024-04-29T15:05:42Z) - Exploring Compressed Image Representation as a Perceptual Proxy: A Study [1.0878040851638]
We propose an end-to-end learned image compression wherein the analysis transform is jointly trained with an object classification task.
This study affirms that the compressed latent representation can predict human perceptual distance judgments with an accuracy comparable to a custom-tailored DNN-based quality metric.
arXiv Detail & Related papers (2024-01-14T04:37:17Z) - Human-imperceptible, Machine-recognizable Images [76.01951148048603]
A major conflict is exposed relating to software engineers between better developing AI systems and distancing from the sensitive training data.
This paper proposes an efficient privacy-preserving learning paradigm, where images are encrypted to become human-imperceptible, machine-recognizable''
We show that the proposed paradigm can ensure the encrypted images have become human-imperceptible while preserving machine-recognizable information.
arXiv Detail & Related papers (2023-06-06T13:41:37Z) - A Shared Representation for Photorealistic Driving Simulators [83.5985178314263]
We propose to improve the quality of generated images by rethinking the discriminator architecture.
The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses.
We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning.
arXiv Detail & Related papers (2021-12-09T18:59:21Z) - Interactive Disentanglement: Learning Concepts by Interacting with their
Prototype Representations [15.284688801788912]
We show the advantages of prototype representations for understanding and revising the latent space of neural concept learners.
For this purpose, we introduce interactive Concept Swapping Networks (iCSNs)
iCSNs learn to bind conceptual information to specific prototype slots by swapping the latent representations of paired images.
arXiv Detail & Related papers (2021-12-04T09:25:40Z) - Neural Photofit: Gaze-based Mental Image Reconstruction [25.67771238116104]
We propose a novel method that leverages human fixations to visually decode the image a person has in mind into a photofit (facial composite)
Our method combines three neural networks: An encoder, a scoring network, and a decoder.
We show that our method significantly outperforms a mean baseline predictor and report on a human study that shows that we can decode photofits that are visually plausible and close to the observer's mental image.
arXiv Detail & Related papers (2021-08-17T09:11:32Z) - Controlled Caption Generation for Images Through Adversarial Attacks [85.66266989600572]
We study adversarial examples for vision and language models, which typically adopt a Convolutional Neural Network (i.e., CNN) for image feature extraction and a Recurrent Neural Network (RNN) for caption generation.
In particular, we investigate attacks on the visual encoder's hidden layer that is fed to the subsequent recurrent network.
We propose a GAN-based algorithm for crafting adversarial examples for neural image captioning that mimics the internal representation of the CNN.
arXiv Detail & Related papers (2021-07-07T07:22:41Z) - Fast Concept Mapping: The Emergence of Human Abilities in Artificial
Neural Networks when Learning Embodied and Self-Supervised [0.0]
We introduce a setup in which an artificial agent first learns in a simulated world through self-supervised exploration.
We use a method we call fast concept mapping which uses correlated firing patterns of neurons to define and detect semantic concepts.
arXiv Detail & Related papers (2021-02-03T17:19:49Z) - Understanding the Role of Individual Units in a Deep Neural Network [85.23117441162772]
We present an analytic framework to systematically identify hidden units within image classification and image generation networks.
First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts.
Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes.
arXiv Detail & Related papers (2020-09-10T17:59:10Z) - Gradient-Induced Co-Saliency Detection [81.54194063218216]
Co-saliency detection (Co-SOD) aims to segment the common salient foreground in a group of relevant images.
In this paper, inspired by human behavior, we propose a gradient-induced co-saliency detection method.
arXiv Detail & Related papers (2020-04-28T08:40:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.