Concept Generalization in Visual Representation Learning
- URL: http://arxiv.org/abs/2012.05649v1
- Date: Thu, 10 Dec 2020 13:13:22 GMT
- Title: Concept Generalization in Visual Representation Learning
- Authors: Mert Bulent Sariyildiz, Yannis Kalantidis, Diane Larlus, Karteek
Alahari
- Abstract summary: We argue that semantic relationships between seen and unseen concepts affect generalization performance.
We propose ImageNet-CoG, a novel benchmark on the ImageNet dataset that enables measuring concept generalization in a principled way.
- Score: 39.32868843527767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Measuring concept generalization, i.e., the extent to which models trained on
a set of (seen) visual concepts can be used to recognize a new set of (unseen)
concepts, is a popular way of evaluating visual representations, especially
when they are learned with self-supervised learning. Nonetheless, the choice of
which unseen concepts to use is usually made arbitrarily, and independently
from the seen concepts used to train representations, thus ignoring any
semantic relationships between the two. In this paper, we argue that semantic
relationships between seen and unseen concepts affect generalization
performance and propose ImageNet-CoG, a novel benchmark on the ImageNet dataset
that enables measuring concept generalization in a principled way. Our
benchmark leverages expert knowledge that comes from WordNet in order to define
a sequence of unseen ImageNet concept sets that are semantically more and more
distant from the ImageNet-1K subset, a ubiquitous training set. This allows us
to benchmark visual representations learned on ImageNet-1K out-of-the box: we
analyse a number of such models from supervised, semi-supervised and
self-supervised approaches under the prism of concept generalization, and show
how our benchmark is able to uncover a number of interesting insights. We will
provide resources for the benchmark at
https://europe.naverlabs.com/cog-benchmark.
Related papers
- Explainable Concept Generation through Vision-Language Preference Learning [7.736445799116692]
Concept-based explanations have become a popular choice for explaining deep neural networks post-hoc.
We devise a reinforcement learning-based preference optimization algorithm that fine-tunes the vision-language generative model.
In addition to showing the efficacy and reliability of our method, we show how our method can be used as a diagnostic tool for analyzing neural networks.
arXiv Detail & Related papers (2024-08-24T02:26:42Z) - Restyling Unsupervised Concept Based Interpretable Networks with Generative Models [14.604305230535026]
We propose a novel method that relies on mapping the concept features to the latent space of a pretrained generative model.
We quantitatively ascertain the efficacy of our method in terms of accuracy of the interpretable prediction network, fidelity of reconstruction, as well as faithfulness and consistency of learnt concepts.
arXiv Detail & Related papers (2024-07-01T14:39:41Z) - Visual Concept-driven Image Generation with Text-to-Image Diffusion Model [65.96212844602866]
Text-to-image (TTI) models have demonstrated impressive results in generating high-resolution images of complex scenes.
Recent approaches have extended these methods with personalization techniques that allow them to integrate user-illustrated concepts.
However, the ability to generate images with multiple interacting concepts, such as human subjects, as well as concepts that may be entangled in one, or across multiple, image illustrations remains illusive.
We propose a concept-driven TTI personalization framework that addresses these core challenges.
arXiv Detail & Related papers (2024-02-18T07:28:37Z) - CEIR: Concept-based Explainable Image Representation Learning [0.4198865250277024]
We introduce Concept-based Explainable Image Representation (CEIR) to derive high-quality representations without label dependency.
Our method exhibits state-of-the-art unsupervised clustering performance on benchmarks such as CIFAR10, CIFAR100, and STL10.
CEIR can seamlessly extract the related concept from open-world images without fine-tuning.
arXiv Detail & Related papers (2023-12-17T15:37:41Z) - Cross-Modal Concept Learning and Inference for Vision-Language Models [31.463771883036607]
In existing fine-tuning methods, the class-specific text description is matched against the whole image.
We develop a new method called cross-model concept learning and inference (CCLI)
Our method automatically learns a large set of distinctive visual concepts from images using a set of semantic text concepts.
arXiv Detail & Related papers (2023-07-28T10:26:28Z) - Does Visual Pretraining Help End-to-End Reasoning? [81.4707017038019]
We investigate whether end-to-end learning of visual reasoning can be achieved with general-purpose neural networks.
We propose a simple and general self-supervised framework which "compresses" each video frame into a small set of tokens.
We observe that pretraining is essential to achieve compositional generalization for end-to-end visual reasoning.
arXiv Detail & Related papers (2023-07-17T14:08:38Z) - ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image
Diffusion Models [79.10890337599166]
We introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts and 33K composite text prompts.
We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions.
Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome.
arXiv Detail & Related papers (2023-06-07T18:00:38Z) - SegDiscover: Visual Concept Discovery via Unsupervised Semantic
Segmentation [29.809900593362844]
SegDiscover is a novel framework that discovers semantically meaningful visual concepts from imagery datasets with complex scenes without supervision.
Our method generates concept primitives from raw images, discovering concepts by clustering in the latent space of a self-supervised pretrained encoder, and concept refinement via neural network smoothing.
arXiv Detail & Related papers (2022-04-22T20:44:42Z) - Unsupervised Learning of Compositional Energy Concepts [70.11673173291426]
We propose COMET, which discovers and represents concepts as separate energy functions.
Comet represents both global concepts as well as objects under a unified framework.
arXiv Detail & Related papers (2021-11-04T17:46:12Z) - Learning Representations by Predicting Bags of Visual Words [55.332200948110895]
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data.
Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions.
arXiv Detail & Related papers (2020-02-27T16:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.