Transformer-based Clipped Contrastive Quantization Learning for
Unsupervised Image Retrieval
- URL: http://arxiv.org/abs/2401.15362v1
- Date: Sat, 27 Jan 2024 09:39:11 GMT
- Title: Transformer-based Clipped Contrastive Quantization Learning for
Unsupervised Image Retrieval
- Authors: Ayush Dubey, Shiv Ram Dubey, Satish Kumar Singh, Wei-Ta Chu
- Abstract summary: Unsupervised image retrieval aims to learn the important visual characteristics without any given level to retrieve the similar images for a given query image.
In this paper, we propose a TransClippedCLR model by encoding the global context of an image using Transformer having local context through patch based processing.
Results using the proposed clipped contrastive learning are greatly improved on all datasets as compared to same backbone network with vanilla contrastive learning.
- Score: 15.982022297570108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised image retrieval aims to learn the important visual
characteristics without any given level to retrieve the similar images for a
given query image. The Convolutional Neural Network (CNN)-based approaches have
been extensively exploited with self-supervised contrastive learning for image
hashing. However, the existing approaches suffer due to lack of effective
utilization of global features by CNNs and biased-ness created by false
negative pairs in the contrastive learning. In this paper, we propose a
TransClippedCLR model by encoding the global context of an image using
Transformer having local context through patch based processing, by generating
the hash codes through product quantization and by avoiding the potential false
negative pairs through clipped contrastive learning. The proposed model is
tested with superior performance for unsupervised image retrieval on benchmark
datasets, including CIFAR10, NUS-Wide and Flickr25K, as compared to the recent
state-of-the-art deep models. The results using the proposed clipped
contrastive learning are greatly improved on all datasets as compared to same
backbone network with vanilla contrastive learning.
Related papers
- Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment [49.36799270585947]
No-reference point cloud quality assessment (NR-PCQA) aims to automatically evaluate the perceptual quality of distorted point clouds without available reference.
We propose a novel contrastive pre-training framework tailored for PCQA (CoPA)
Our method outperforms the state-of-the-art PCQA methods on popular benchmarks.
arXiv Detail & Related papers (2024-03-15T07:16:07Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Feedback Assisted Adversarial Learning to Improve the Quality of
Cone-beam CT Images [2.179313476241343]
We propose adversarial learning with a feedback mechanism from a discriminator to improve the quality of CBCT images.
This framework employs U-net as the discriminator and outputs a probability map representing the local discrimination results.
arXiv Detail & Related papers (2022-10-23T00:31:51Z) - Deep Semantic Statistics Matching (D2SM) Denoising Network [70.01091467628068]
We introduce the Deep Semantic Statistics Matching (D2SM) Denoising Network.
It exploits semantic features of pretrained classification networks, then it implicitly matches the probabilistic distribution of clear images at the semantic feature space.
By learning to preserve the semantic distribution of denoised images, we empirically find our method significantly improves the denoising capabilities of networks.
arXiv Detail & Related papers (2022-07-19T14:35:42Z) - Robust Cross-Modal Representation Learning with Progressive
Self-Distillation [7.676408770854477]
The learning objective of vision-language approach of CLIP does not effectively account for the noisy many-to-many correspondences found in web-harvested image captioning datasets.
We introduce a novel training framework based on cross-modal contrastive learning that uses progressive self-distillation and soft image-text alignments to more efficiently learn robust representations from noisy data.
arXiv Detail & Related papers (2022-04-10T03:28:18Z) - Image Quality Assessment using Contrastive Learning [50.265638572116984]
We train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem.
We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models.
Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets.
arXiv Detail & Related papers (2021-10-25T21:01:00Z) - Diffusion-Based Representation Learning [65.55681678004038]
We augment the denoising score matching framework to enable representation learning without any supervised signal.
In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective.
Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification.
arXiv Detail & Related papers (2021-05-29T09:26:02Z) - Contrastive Learning based Hybrid Networks for Long-Tailed Image
Classification [31.647639786095993]
We propose a novel hybrid network structure composed of a supervised contrastive loss to learn image representations and a cross-entropy loss to learn classifiers.
Experiments on three long-tailed classification datasets demonstrate the advantage of the proposed contrastive learning based hybrid networks in long-tailed classification.
arXiv Detail & Related papers (2021-03-26T05:22:36Z) - SIR: Self-supervised Image Rectification via Seeing the Same Scene from
Multiple Different Lenses [82.56853587380168]
We propose a novel self-supervised image rectification (SIR) method based on an important insight that the rectified results of distorted images of the same scene from different lens should be the same.
We leverage a differentiable warping module to generate the rectified images and re-distorted images from the distortion parameters.
Our method achieves comparable or even better performance than the supervised baseline method and representative state-of-the-art methods.
arXiv Detail & Related papers (2020-11-30T08:23:25Z) - Low-Dose CT Image Denoising Using Parallel-Clone Networks [9.318613261995406]
We propose a parallel-clone neural network method that exploits the benefit of parallel input, parallel-output loss, and clone-toclone feature transfer.
The proposed model keeps a similar or less number of unknown network weights as compared to conventional models but can accelerate the learning process significantly.
arXiv Detail & Related papers (2020-05-14T05:21:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.