Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning
- URL: http://arxiv.org/abs/2601.11393v2
- Date: Thu, 22 Jan 2026 11:13:19 GMT
- Title: Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning
- Authors: Haomiao Tang, Jinpeng Wang, Minyi Zhao, Guanghao Meng, Ruisheng Luo, Long Chen, Shu-Tao Xia,
- Abstract summary: Composed Image Retrieval (CIR) enables image search by combining a reference image with modification text.<n>In intrinsic noise in CIR triplets incurs intrinsic uncertainty and threatens the model's robustness.<n>This paper introduces a Heterogeneous Uncertainty-Guided (HUG) paradigm to overcome these limitations.
- Score: 49.28548464288051
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Composed Image Retrieval (CIR) enables image search by combining a reference image with modification text. Intrinsic noise in CIR triplets incurs intrinsic uncertainty and threatens the model's robustness. Probabilistic learning approaches have shown promise in addressing such issues; however, they fall short for CIR due to their instance-level holistic modeling and homogeneous treatment of queries and targets. This paper introduces a Heterogeneous Uncertainty-Guided (HUG) paradigm to overcome these limitations. HUG utilizes a fine-grained probabilistic learning framework, where queries and targets are represented by Gaussian embeddings that capture detailed concepts and uncertainties. We customize heterogeneous uncertainty estimations for multi-modal queries and uni-modal targets. Given a query, we capture uncertainties not only regarding uni-modal content quality but also multi-modal coordination, followed by a provable dynamic weighting mechanism to derive comprehensive query uncertainty. We further design uncertainty-guided objectives, including query-target holistic contrast and fine-grained contrasts with comprehensive negative sampling strategies, which effectively enhance discriminative learning. Experiments on benchmarks demonstrate HUG's effectiveness beyond state-of-the-art baselines, with faithful analysis justifying the technical contributions.
Related papers
- Implicit Neural Representation-Based Continuous Single Image Super Resolution: An Empirical Study [50.15623093332659]
Implicit neural representation (INR) has become the standard approach for arbitrary-scale image super-resolution (ASSR)<n>We compare existing techniques across diverse settings and present aggregated performance results on multiple image quality metrics.<n>We examine a new loss function that penalizes intensity variations while preserving edges, textures, and finer details during training.
arXiv Detail & Related papers (2026-01-25T07:09:20Z) - LLM-Centric RAG with Multi-Granular Indexing and Confidence Constraints [5.2604064919135896]
This paper addresses the issues of insufficient coverage, unstable results, and limited reliability in retrieval-augmented generation under complex knowledge environments.<n>It proposes a confidence control method that integrates multi-granularity memory indexing with uncertainty estimation.<n>The results show that the method achieves superior performance over existing models in QA accuracy, retrieval recall, ranking quality, and factual consistency.
arXiv Detail & Related papers (2025-10-30T23:48:37Z) - Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning [28.15997901023315]
Recall is a novel adversarial framework designed to compromise the robustness of unlearned IGMs.<n>It consistently outperforms existing baselines in terms of adversarial effectiveness, computational efficiency, and semantic fidelity with the original prompt.<n>These findings reveal critical vulnerabilities in current unlearning mechanisms and underscore the need for more robust solutions.
arXiv Detail & Related papers (2025-07-09T02:59:01Z) - UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification [26.770271366177603]
We propose a robust approach named Uncertainty-Guided Graph model for multi-modal object ReID (UGG-ReID)<n>UGG-ReID is designed to mitigate noise interference and facilitate effective multi-modal fusion.<n> Experimental results show that the proposed method achieves excellent performance on all datasets.
arXiv Detail & Related papers (2025-07-07T03:41:08Z) - Leveraging Vision-Language Models to Select Trustworthy Super-Resolution Samples Generated by Diffusion Models [0.026861992804651083]
This paper introduces a robust framework for identifying the most trustworthy SR sample from a diffusion-generated set.<n>We propose a novel Trustworthiness Score (TWS) a hybrid metric that quantifies SR reliability based on semantic similarity.<n>By aligning outputs with human expectations and semantic correctness, this work sets a new benchmark for trustworthiness in generative SR.
arXiv Detail & Related papers (2025-06-25T21:00:44Z) - A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation [0.0]
This review consolidates and contextualizes foundational concepts in uncertainty modeling.<n>We identify challenges, such as strong assumptions in spatial aggregation and lack of standardized benchmarks.<n>We propose directions for advancing uncertainty-aware segmentation in deep learning.
arXiv Detail & Related papers (2024-11-25T13:26:09Z) - Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions.
We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z) - PAIF: Perception-Aware Infrared-Visible Image Fusion for Attack-Tolerant
Semantic Segmentation [50.556961575275345]
We propose a perception-aware fusion framework to promote segmentation robustness in adversarial scenes.
We show that our scheme substantially enhances the robustness, with gains of 15.3% mIOU, compared with advanced competitors.
arXiv Detail & Related papers (2023-08-08T01:55:44Z) - Trusted Multi-View Classification with Dynamic Evidential Fusion [73.35990456162745]
We propose a novel multi-view classification algorithm, termed trusted multi-view classification (TMC)
TMC provides a new paradigm for multi-view learning by dynamically integrating different views at an evidence level.
Both theoretical and experimental results validate the effectiveness of the proposed model in accuracy, robustness and trustworthiness.
arXiv Detail & Related papers (2022-04-25T03:48:49Z) - Uncertainty-Aware Few-Shot Image Classification [118.72423376789062]
Few-shot image classification learns to recognize new categories from limited labelled data.
We propose Uncertainty-Aware Few-Shot framework for image classification.
arXiv Detail & Related papers (2020-10-09T12:26:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.