HCSC: Hierarchical Contrastive Selective Coding
- URL: http://arxiv.org/abs/2202.00455v1
- Date: Tue, 1 Feb 2022 15:04:40 GMT
- Title: HCSC: Hierarchical Contrastive Selective Coding
- Authors: Yuanfan Guo, Minghao Xu, Jiawen Li, Bingbing Ni, Xuanyu Zhu, Zhenbang
Sun, Yi Xu
- Abstract summary: Hierarchical Contrastive Selective Coding (HCSC) is a novel contrastive learning framework.
We introduce an elaborate pair selection scheme to make image representations better fit semantic structures.
We verify the superior performance of HCSC over state-of-the-art contrastive methods.
- Score: 44.655310210531226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical semantic structures naturally exist in an image dataset, in
which several semantically relevant image clusters can be further integrated
into a larger cluster with coarser-grained semantics. Capturing such structures
with image representations can greatly benefit the semantic understanding on
various downstream tasks. Existing contrastive representation learning methods
lack such an important model capability. In addition, the negative pairs used
in these methods are not guaranteed to be semantically distinct, which could
further hamper the structural correctness of learned image representations. To
tackle these limitations, we propose a novel contrastive learning framework
called Hierarchical Contrastive Selective Coding (HCSC). In this framework, a
set of hierarchical prototypes are constructed and also dynamically updated to
represent the hierarchical semantic structures underlying the data in the
latent space. To make image representations better fit such semantic
structures, we employ and further improve conventional instance-wise and
prototypical contrastive learning via an elaborate pair selection scheme. This
scheme seeks to select more diverse positive pairs with similar semantics and
more precise negative pairs with truly distinct semantics. On extensive
downstream tasks, we verify the superior performance of HCSC over
state-of-the-art contrastive methods, and the effectiveness of major model
components is proved by plentiful analytical studies. Our source code and model
weights are available at https://github.com/gyfastas/HCSC
Related papers
- Learning Visual Hierarchies with Hyperbolic Embeddings [28.35250955426006]
We introduce a learning paradigm that can encode user-defined multi-level visual hierarchies in hyperbolic space without requiring explicit hierarchical labels.
We show significant improvements in hierarchical retrieval tasks, demonstrating the capability of our model in capturing visual hierarchies.
arXiv Detail & Related papers (2024-11-26T14:58:06Z) - Emergent Visual-Semantic Hierarchies in Image-Text Representations [13.300199242824934]
We study the knowledge of existing foundation models, finding that they exhibit emergent understanding of visual-semantic hierarchies.
We propose the Radial Embedding (RE) framework for probing and optimizing hierarchical understanding.
arXiv Detail & Related papers (2024-07-11T14:09:42Z) - Cross-Modal Concept Learning and Inference for Vision-Language Models [31.463771883036607]
In existing fine-tuning methods, the class-specific text description is matched against the whole image.
We develop a new method called cross-model concept learning and inference (CCLI)
Our method automatically learns a large set of distinctive visual concepts from images using a set of semantic text concepts.
arXiv Detail & Related papers (2023-07-28T10:26:28Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - HIRL: A General Framework for Hierarchical Image Representation Learning [54.12773508883117]
We propose a general framework for Hierarchical Image Representation Learning (HIRL)
This framework aims to learn multiple semantic representations for each image, and these representations are structured to encode image semantics from fine-grained to coarse-grained.
Based on a probabilistic factorization, HIRL learns the most fine-grained semantics by an off-the-shelf image SSL approach and learns multiple coarse-grained semantics by a novel semantic path discrimination scheme.
arXiv Detail & Related papers (2022-05-26T05:13:26Z) - Semantic Representation and Dependency Learning for Multi-Label Image
Recognition [76.52120002993728]
We propose a novel and effective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category.
Specifically, we design a category-specific attentional regions (CAR) module to generate channel/spatial-wise attention matrices to guide model.
We also design an object erasing (OE) module to implicitly learn semantic dependency among categories by erasing semantic-aware regions.
arXiv Detail & Related papers (2022-04-08T00:55:15Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Dense Semantic Contrast for Self-Supervised Visual Representation
Learning [12.636783522731392]
We present Dense Semantic Contrast (DSC) for modeling semantic category decision boundaries at a dense level.
We propose a dense cross-image semantic contrastive learning framework for multi-granularity representation learning.
Experimental results show that our DSC model outperforms state-of-the-art methods when transferring to downstream dense prediction tasks.
arXiv Detail & Related papers (2021-09-16T07:04:05Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.