CLiC: Concept Learning in Context
- URL: http://arxiv.org/abs/2311.17083v1
- Date: Tue, 28 Nov 2023 01:33:18 GMT
- Title: CLiC: Concept Learning in Context
- Authors: Mehdi Safaee, Aryan Mikaeili, Or Patashnik, Daniel Cohen-Or, Ali
Mahdavi-Amiri
- Abstract summary: This paper builds upon recent advancements in visual concept learning.
It involves acquiring a visual concept from a source image and subsequently applying it to an object in a target image.
To localize the concept learning, we employ soft masks that contain both the concept within the mask and the surrounding image area.
- Score: 54.81654147248919
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper addresses the challenge of learning a local visual pattern of an
object from one image, and generating images depicting objects with that
pattern. Learning a localized concept and placing it on an object in a target
image is a nontrivial task, as the objects may have different orientations and
shapes. Our approach builds upon recent advancements in visual concept
learning. It involves acquiring a visual concept (e.g., an ornament) from a
source image and subsequently applying it to an object (e.g., a chair) in a
target image. Our key idea is to perform in-context concept learning, acquiring
the local visual concept within the broader context of the objects they belong
to. To localize the concept learning, we employ soft masks that contain both
the concept within the mask and the surrounding image area. We demonstrate our
approach through object generation within an image, showcasing plausible
embedding of in-context learned concepts. We also introduce methods for
directing acquired concepts to specific locations within target images,
employing cross-attention mechanisms, and establishing correspondences between
source and target objects. The effectiveness of our method is demonstrated
through quantitative and qualitative experiments, along with comparisons
against baseline techniques.
Related papers
- Explainable Concept Generation through Vision-Language Preference Learning [7.736445799116692]
Concept-based explanations have become a popular choice for explaining deep neural networks post-hoc.
We devise a reinforcement learning-based preference optimization algorithm that fine-tunes the vision-language generative model.
In addition to showing the efficacy and reliability of our method, we show how our method can be used as a diagnostic tool for analyzing neural networks.
arXiv Detail & Related papers (2024-08-24T02:26:42Z) - Learning Scene Context Without Images [2.8184014933789365]
We introduce a novel approach to teach scene contextual knowledge to machines using an attention mechanism.
A distinctive aspect of the proposed approach is its reliance solely on labels from image datasets to teach scene context.
We show how scene-wide relationships among different objects can be learned using a self-attention mechanism.
arXiv Detail & Related papers (2023-11-18T07:27:25Z) - Text-to-Image Generation for Abstract Concepts [76.32278151607763]
We propose a framework of Text-to-Image generation for Abstract Concepts (TIAC)
The abstract concept is clarified into a clear intent with a detailed definition to avoid ambiguity.
The concept-dependent form is retrieved from an LLM-extracted form pattern set.
arXiv Detail & Related papers (2023-09-26T02:22:39Z) - Cross-Modal Concept Learning and Inference for Vision-Language Models [31.463771883036607]
In existing fine-tuning methods, the class-specific text description is matched against the whole image.
We develop a new method called cross-model concept learning and inference (CCLI)
Our method automatically learns a large set of distinctive visual concepts from images using a set of semantic text concepts.
arXiv Detail & Related papers (2023-07-28T10:26:28Z) - ConceptBed: Evaluating Concept Learning Abilities of Text-to-Image
Diffusion Models [79.10890337599166]
We introduce ConceptBed, a large-scale dataset that consists of 284 unique visual concepts and 33K composite text prompts.
We evaluate visual concepts that are either objects, attributes, or styles, and also evaluate four dimensions of compositionality: counting, attributes, relations, and actions.
Our results point to a trade-off between learning the concepts and preserving the compositionality which existing approaches struggle to overcome.
arXiv Detail & Related papers (2023-06-07T18:00:38Z) - Hyperbolic Contrastive Learning for Visual Representations beyond
Objects [30.618032825306187]
We focus on learning representations for objects and scenes that preserve the structure among them.
Motivated by the observation that visually similar objects are close in the representation space, we argue that the scenes and objects should instead follow a hierarchical structure.
arXiv Detail & Related papers (2022-12-01T16:58:57Z) - Few-Shot Object Detection by Knowledge Distillation Using
Bag-of-Visual-Words Representations [58.48995335728938]
We design a novel knowledge distillation framework to guide the learning of the object detector.
We first present a novel Position-Aware Bag-of-Visual-Words model for learning a representative bag of visual words.
We then perform knowledge distillation based on the fact that an image should have consistent BoVW representations in two different feature spaces.
arXiv Detail & Related papers (2022-07-25T10:40:40Z) - Unsupervised Learning of Compositional Energy Concepts [70.11673173291426]
We propose COMET, which discovers and represents concepts as separate energy functions.
Comet represents both global concepts as well as objects under a unified framework.
arXiv Detail & Related papers (2021-11-04T17:46:12Z) - Learning Representations by Predicting Bags of Visual Words [55.332200948110895]
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data.
Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions.
arXiv Detail & Related papers (2020-02-27T16:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.