Related papers: NCL-CIR: Noise-aware Contrastive Learning for Composed Image Retrieval

NCL-CIR: Noise-aware Contrastive Learning for Composed Image Retrieval

URL: http://arxiv.org/abs/2504.04339v2
Date: Mon, 28 Apr 2025 03:08:42 GMT
Title: NCL-CIR: Noise-aware Contrastive Learning for Composed Image Retrieval
Authors: Peng Gao, Yujian Lee, Zailong Chen, Hui zhang, Xubo Liu, Yiyang Hu, Guquang Jing,
Abstract summary: Composed Image Retrieval (CIR) seeks to find a target image using a multi-modal query, which combines an image with modification text to pinpoint the target.<n> pairs are often partially or completely mismatched due to issues like inaccurate modification texts, low-quality target images, and annotation errors.<n>We propose the Noise-aware Contrastive Learning for CIR (NCL-CIR) comprising two key components: the Weight Compensation Block (WCB) and the Noise-pair Filter Block (NFB).
Score: 16.460121977322224
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Composed Image Retrieval (CIR) seeks to find a target image using a multi-modal query, which combines an image with modification text to pinpoint the target. While recent CIR methods have shown promise, they mainly focus on exploring relationships between the query pairs (image and text) through data augmentation or model design. These methods often assume perfect alignment between queries and target images, an idealized scenario rarely encountered in practice. In reality, pairs are often partially or completely mismatched due to issues like inaccurate modification texts, low-quality target images, and annotation errors. Ignoring these mismatches leads to numerous False Positive Pair (FFPs) denoted as noise pairs in the dataset, causing the model to overfit and ultimately reducing its performance. To address this problem, we propose the Noise-aware Contrastive Learning for CIR (NCL-CIR), comprising two key components: the Weight Compensation Block (WCB) and the Noise-pair Filter Block (NFB). The WCB coupled with diverse weight maps can ensure more stable token representations of multi-modal queries and target images. Meanwhile, the NFB, in conjunction with the Gaussian Mixture Model (GMM) predicts noise pairs by evaluating loss distributions, and generates soft labels correspondingly, allowing for the design of the soft-label based Noise Contrastive Estimation (NCE) loss function. Consequently, the overall architecture helps to mitigate the influence of mismatched and partially matched samples, with experimental results demonstrating that NCL-CIR achieves exceptional performance on the benchmark datasets.

Related papers

Transformer-based Clipped Contrastive Quantization Learning for Unsupervised Image Retrieval [15.982022297570108]
Unsupervised image retrieval aims to learn the important visual characteristics without any given level to retrieve the similar images for a given query image. In this paper, we propose a TransClippedCLR model by encoding the global context of an image using Transformer having local context through patch based processing. Results using the proposed clipped contrastive learning are greatly improved on all datasets as compared to same backbone network with vanilla contrastive learning.
arXiv Detail & Related papers (2024-01-27T09:39:11Z)
Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation [15.411325887412413]
This paper proposes a novel model named "Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment" (FSA-CDM) FSA-CDM introduces contrastive positive/negative samples into the diffusion model to boost performance for markup-to-image generation. Experiments are conducted on four benchmark datasets from different domains.
arXiv Detail & Related papers (2023-08-02T13:43:03Z)
Advancing Unsupervised Low-light Image Enhancement: Noise Estimation, Illumination Interpolation, and Self-Regulation [55.07472635587852]
Low-Light Image Enhancement (LLIE) techniques have made notable advancements in preserving image details and enhancing contrast. These approaches encounter persistent challenges in efficiently mitigating dynamic noise and accommodating diverse low-light scenarios. We first propose a method for estimating the noise level in low light images in a quick and accurate way. We then devise a Learnable Illumination Interpolator (LII) to satisfy general constraints between illumination and input.
arXiv Detail & Related papers (2023-05-17T13:56:48Z)
Boosting Few-shot Fine-grained Recognition with Background Suppression and Foreground Alignment [53.401889855278704]
Few-shot fine-grained recognition (FS-FGR) aims to recognize novel fine-grained categories with the help of limited available samples. We propose a two-stage background suppression and foreground alignment framework, which is composed of a background activation suppression (BAS) module, a foreground object alignment (FOA) module, and a local to local (L2L) similarity metric. Experiments conducted on multiple popular fine-grained benchmarks demonstrate that our method outperforms the existing state-of-the-art by a large margin.
arXiv Detail & Related papers (2022-10-04T07:54:40Z)
Embedding contrastive unsupervised features to cluster in- and out-of-distribution noise in corrupted image datasets [18.19216557948184]
Using search engines for web image retrieval is a tempting alternative to manual curation when creating an image dataset. Their main drawback remains the proportion of incorrect (noisy) samples retrieved. We propose a two stage algorithm starting with a detection step where we use unsupervised contrastive feature learning. We find that the alignment and uniformity principles of contrastive learning allow OOD samples to be linearly separated from ID samples on the unit hypersphere.
arXiv Detail & Related papers (2022-07-04T16:51:56Z)
Treatment Learning Causal Transformer for Noisy Image Classification [62.639851972495094]
In this work, we incorporate this binary information of "existence of noise" as treatment into image classification tasks to improve prediction accuracy. Motivated from causal variational inference, we propose a transformer-based architecture, that uses a latent generative model to estimate robust feature representations for noise image classification. We also create new noisy image datasets incorporating a wide range of noise factors for performance benchmarking.
arXiv Detail & Related papers (2022-03-29T13:07:53Z)
Surrogate-based cross-correlation for particle image velocimetry [4.306143768014157]
This paper presents a novel surrogate-based cross-correlation (SBCC) framework to improve the correlation performance for practical particle image velocimetry(PIV)
arXiv Detail & Related papers (2021-12-10T02:45:42Z)
Residual Contrastive Learning for Joint Demosaicking and Denoising [49.81596361351967]
We present a novel contrastive learning approach on RAW images, residual contrastive learning (RCL) Our work is built on the assumption that noise contained in each RAW image is signal-dependent. We set a new benchmark for unsupervised JDD tasks with unknown (random) noise variance.
arXiv Detail & Related papers (2021-06-18T11:37:05Z)
Understanding Adversarial Examples from the Mutual Influence of Images and Perturbations [83.60161052867534]
We analyze adversarial examples by disentangling the clean images and adversarial perturbations, and analyze their influence on each other. Our results suggest a new perspective towards the relationship between images and universal perturbations. We are the first to achieve the challenging task of a targeted universal attack without utilizing original training data.
arXiv Detail & Related papers (2020-07-13T05:00:09Z)
Fully Unsupervised Diversity Denoising with Convolutional Variational Autoencoders [81.30960319178725]
We propose DivNoising, a denoising approach based on fully convolutional variational autoencoders (VAEs) First we introduce a principled way of formulating the unsupervised denoising problem within the VAE framework by explicitly incorporating imaging noise models into the decoder. We show that such a noise model can either be measured, bootstrapped from noisy data, or co-learned during training.
arXiv Detail & Related papers (2020-06-10T21:28:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.