Unsupervised Hashing with Semantic Concept Mining
- URL: http://arxiv.org/abs/2209.11475v1
- Date: Fri, 23 Sep 2022 08:25:24 GMT
- Title: Unsupervised Hashing with Semantic Concept Mining
- Authors: Rong-Cheng Tu and Xian-Ling Mao and Kevin Qinghong Lin and Chengfei
Cai and Weize Qin and Hongfa Wang and Wei Wei and Heyan Huang
- Abstract summary: In this work, we propose a novel Un Hashing with Semantic Mining Concept, calledCM, which leverages a.
high-quality similarity matrix.
With the semantic similarity matrix as guiding information, a novel hashing loss with a modified contrastive loss based regularization item is proposed to optimize the hashing network.
- Score: 37.215530006668935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, to improve the unsupervised image retrieval performance, plenty of
unsupervised hashing methods have been proposed by designing a semantic
similarity matrix, which is based on the similarities between image features
extracted by a pre-trained CNN model. However, most of these methods tend to
ignore high-level abstract semantic concepts contained in images. Intuitively,
concepts play an important role in calculating the similarity among images. In
real-world scenarios, each image is associated with some concepts, and the
similarity between two images will be larger if they share more identical
concepts. Inspired by the above intuition, in this work, we propose a novel
Unsupervised Hashing with Semantic Concept Mining, called UHSCM, which
leverages a VLP model to construct a high-quality similarity matrix.
Specifically, a set of randomly chosen concepts is first collected. Then, by
employing a vision-language pretraining (VLP) model with the prompt engineering
which has shown strong power in visual representation learning, the set of
concepts is denoised according to the training images. Next, the proposed
method UHSCM applies the VLP model with prompting again to mine the concept
distribution of each image and construct a high-quality semantic similarity
matrix based on the mined concept distributions. Finally, with the semantic
similarity matrix as guiding information, a novel hashing loss with a modified
contrastive loss based regularization item is proposed to optimize the hashing
network. Extensive experiments on three benchmark datasets show that the
proposed method outperforms the state-of-the-art baselines in the image
retrieval task.
Related papers
- DEMO: A Statistical Perspective for Efficient Image-Text Matching [32.256725860652914]
We introduce Distribution-based Structure Mining with Consistency Learning (DEMO) for efficient image-text matching.
DEMO characterizes each image using multiple augmented views, which are considered as samples drawn from its intrinsic semantic distribution.
In addition, we introduce collaborative consistency learning which not only preserves the similarity structure in the Hamming space but also encourages consistency between retrieval distribution from different directions.
arXiv Detail & Related papers (2024-05-19T09:38:56Z) - Weighted Point Cloud Embedding for Multimodal Contrastive Learning Toward Optimal Similarity Metric [44.95433989446052]
We show the benefit of our proposed method through a new understanding of the contrastive loss of CLIP.
We show that our proposed similarity based on weighted point clouds consistently achieves the optimal similarity.
arXiv Detail & Related papers (2024-04-30T03:15:04Z) - Visual Concept-driven Image Generation with Text-to-Image Diffusion Model [65.96212844602866]
Text-to-image (TTI) models have demonstrated impressive results in generating high-resolution images of complex scenes.
Recent approaches have extended these methods with personalization techniques that allow them to integrate user-illustrated concepts.
However, the ability to generate images with multiple interacting concepts, such as human subjects, as well as concepts that may be entangled in one, or across multiple, image illustrations remains illusive.
We propose a concept-driven TTI personalization framework that addresses these core challenges.
arXiv Detail & Related papers (2024-02-18T07:28:37Z) - Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition.
We propose augmenting the input image with masks that indicate the presence of target concepts.
We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z) - ACSeg: Adaptive Conceptualization for Unsupervised Semantic Segmentation [17.019848796027485]
Self-supervised visual pre-training models have shown great promise in representing pixel-level semantic relationships.
In this work, we investigate the pixel-level semantic aggregation in self-trained models as image encodes and design concepts.
We propose the Adaptive Concept Generator (ACG) which adaptively maps these prototypes to informative concepts for each image.
arXiv Detail & Related papers (2022-10-12T06:16:34Z) - Weighted Contrastive Hashing [11.14153532458873]
Unsupervised hash development has been hampered by insufficient data similarity mining based on global-only image representations.
We introduce a novel mutual attention module to alleviate the problem of information asymmetry in network features caused by the missing image structure.
The aggregated weighted similarities, which reflect the deep image relations, are distilled to facilitate the hash codes learning with a distillation loss.
arXiv Detail & Related papers (2022-09-28T13:47:33Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - Deep Variational Network Toward Blind Image Restoration [60.45350399661175]
Blind image restoration is a common yet challenging problem in computer vision.
We propose a novel blind image restoration method, aiming to integrate both the advantages of them.
Experiments on two typical blind IR tasks, namely image denoising and super-resolution, demonstrate that the proposed method achieves superior performance over current state-of-the-arts.
arXiv Detail & Related papers (2020-08-25T03:30:53Z) - Learning Representations by Predicting Bags of Visual Words [55.332200948110895]
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data.
Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions.
arXiv Detail & Related papers (2020-02-27T16:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.