Related papers: Dynamic Uncertainty Learning with Noisy Correspondence for Text-Based Person Search

Dynamic Uncertainty Learning with Noisy Correspondence for Text-Based Person Search

URL: http://arxiv.org/abs/2505.06566v1
Date: Sat, 10 May 2025 08:35:36 GMT
Title: Dynamic Uncertainty Learning with Noisy Correspondence for Text-Based Person Search
Authors: Zequn Xie, Haoming Ji, Lingwei Meng,
Abstract summary: Large-scale text-image datasets are created from co-occurrences online.<n>Existing methods often focus on negative samples, amplifying the noise.<n>We propose Dynamic Uncertainty and Alignment framework, which includes Key Feature Selector (KFS) and a new loss function, Dynamic Softmax Hinge Loss (DSH-Loss)<n>Our experiments show that the method offers strong noise resistance and improves retrieval performance in both low- and high-noise scenarios.
Score: 2.3099448395832956
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-image person search aims to identify an individual based on a text description. To reduce data collection costs, large-scale text-image datasets are created from co-occurrence pairs found online. However, this can introduce noise, particularly mismatched pairs, which degrade retrieval performance. Existing methods often focus on negative samples, amplifying this noise. To address these issues, we propose the Dynamic Uncertainty and Relational Alignment (DURA) framework, which includes the Key Feature Selector (KFS) and a new loss function, Dynamic Softmax Hinge Loss (DSH-Loss). KFS captures and models noise uncertainty, improving retrieval reliability. The bidirectional evidence from cross-modal similarity is modeled as a Dirichlet distribution, enhancing adaptability to noisy data. DSH adjusts the difficulty of negative samples to improve robustness in noisy environments. Our experiments on three datasets show that the method offers strong noise resistance and improves retrieval performance in both low- and high-noise scenarios.

Related papers

Noise Augmented Fine Tuning for Mitigating Hallucinations in Large Language Models [1.0579965347526206]
Large language models (LLMs) often produce inaccurate or misleading content-hallucinations.<n>Noise-Augmented Fine-Tuning (NoiseFiT) is a novel framework that leverages adaptive noise injection to enhance model robustness.<n>NoiseFiT selectively perturbs layers identified as either high-SNR (more robust) or low-SNR (potentially under-regularized) using a dynamically scaled Gaussian noise.
arXiv Detail & Related papers (2025-04-04T09:27:19Z)
TSVC:Tripartite Learning with Semantic Variation Consistency for Robust Image-Text Retrieval [11.874979105806243]
Cross-modal retrieval maps data under different modality via semantic relevance.<n>Existing approaches implicitly assume that data pairs are well-aligned and ignore the widely existing annotation noise.<n>We introduce a Tripartite learning with Semantic Variation Consistency (TSVC) for robust image-text retrieval.
arXiv Detail & Related papers (2025-01-19T04:05:08Z)
AdapNet: Adaptive Noise-Based Network for Low-Quality Image Retrieval [41.9012882549912]
We propose an Adaptive Noise-Based Network (AdapNet) to learn robust abstract representations. AdapNet surpasses state-of-the-art methods on the Noise Revisited Oxford and Noise Revisited Paris benchmarks.
arXiv Detail & Related papers (2024-05-28T00:25:41Z)
Noisy Pair Corrector for Dense Retrieval [59.312376423104055]
We propose a novel approach called Noisy Pair Corrector (NPC) NPC consists of a detection module and a correction module. We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS.
arXiv Detail & Related papers (2023-11-07T08:27:14Z)
Improving the Robustness of Summarization Systems with Dual Augmentation [68.53139002203118]
A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input. We first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise. We propose a SummAttacker, which is an efficient approach to generating adversarial samples based on language models.
arXiv Detail & Related papers (2023-06-01T19:04:17Z)
Advancing Unsupervised Low-light Image Enhancement: Noise Estimation, Illumination Interpolation, and Self-Regulation [55.07472635587852]
Low-Light Image Enhancement (LLIE) techniques have made notable advancements in preserving image details and enhancing contrast. These approaches encounter persistent challenges in efficiently mitigating dynamic noise and accommodating diverse low-light scenarios. We first propose a method for estimating the noise level in low light images in a quick and accurate way. We then devise a Learnable Illumination Interpolator (LII) to satisfy general constraints between illumination and input.
arXiv Detail & Related papers (2023-05-17T13:56:48Z)
Confidence-based Reliable Learning under Dual Noises [46.45663546457154]
Deep neural networks (DNNs) have achieved remarkable success in a variety of computer vision tasks. Yet, the data collected from the open world are unavoidably polluted by noise, which may significantly undermine the efficacy of the learned models. Various attempts have been made to reliably train DNNs under data noise, but they separately account for either the noise existing in the labels or that existing in the images. This work provides a first, unified framework for reliable learning under the joint (image, label)-noise.
arXiv Detail & Related papers (2023-02-10T07:50:34Z)
NLIP: Noise-robust Language-Image Pre-training [95.13287735264937]
We propose a principled Noise-robust Language-Image Pre-training framework (NLIP) to stabilize pre-training via two schemes: noise-harmonization and noise-completion. Our NLIP can alleviate the common noise effects during image-text pre-training in a more efficient way.
arXiv Detail & Related papers (2022-12-14T08:19:30Z)
Embedding contrastive unsupervised features to cluster in- and out-of-distribution noise in corrupted image datasets [18.19216557948184]
Using search engines for web image retrieval is a tempting alternative to manual curation when creating an image dataset. Their main drawback remains the proportion of incorrect (noisy) samples retrieved. We propose a two stage algorithm starting with a detection step where we use unsupervised contrastive feature learning. We find that the alignment and uniformity principles of contrastive learning allow OOD samples to be linearly separated from ID samples on the unit hypersphere.
arXiv Detail & Related papers (2022-07-04T16:51:56Z)
Towards Adversarially Robust Deep Image Denoising [199.2458715635285]
This work systematically investigates the adversarial robustness of deep image denoisers (DIDs) We propose a novel adversarial attack, namely Observation-based Zero-mean Attack (sc ObsAtk) to craft adversarial zero-mean perturbations on given noisy images. To robustify DIDs, we propose hybrid adversarial training (sc HAT) that jointly trains DIDs with adversarial and non-adversarial noisy data.
arXiv Detail & Related papers (2022-01-12T10:23:14Z)
Bridging the Gap Between Clean Data Training and Real-World Inference for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference. We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space. Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.