Related papers: Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning

Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning

URL: http://arxiv.org/abs/2601.11393v2
Date: Thu, 22 Jan 2026 11:13:19 GMT
Title: Heterogeneous Uncertainty-Guided Composed Image Retrieval with Fine-Grained Probabilistic Learning
Authors: Haomiao Tang, Jinpeng Wang, Minyi Zhao, Guanghao Meng, Ruisheng Luo, Long Chen, Shu-Tao Xia,
Abstract summary: Composed Image Retrieval (CIR) enables image search by combining a reference image with modification text.<n>In intrinsic noise in CIR triplets incurs intrinsic uncertainty and threatens the model's robustness.<n>This paper introduces a Heterogeneous Uncertainty-Guided (HUG) paradigm to overcome these limitations.
Score: 49.28548464288051
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Composed Image Retrieval (CIR) enables image search by combining a reference image with modification text. Intrinsic noise in CIR triplets incurs intrinsic uncertainty and threatens the model's robustness. Probabilistic learning approaches have shown promise in addressing such issues; however, they fall short for CIR due to their instance-level holistic modeling and homogeneous treatment of queries and targets. This paper introduces a Heterogeneous Uncertainty-Guided (HUG) paradigm to overcome these limitations. HUG utilizes a fine-grained probabilistic learning framework, where queries and targets are represented by Gaussian embeddings that capture detailed concepts and uncertainties. We customize heterogeneous uncertainty estimations for multi-modal queries and uni-modal targets. Given a query, we capture uncertainties not only regarding uni-modal content quality but also multi-modal coordination, followed by a provable dynamic weighting mechanism to derive comprehensive query uncertainty. We further design uncertainty-guided objectives, including query-target holistic contrast and fine-grained contrasts with comprehensive negative sampling strategies, which effectively enhance discriminative learning. Experiments on benchmarks demonstrate HUG's effectiveness beyond state-of-the-art baselines, with faithful analysis justifying the technical contributions.

Related papers

Implicit Neural Representation-Based Continuous Single Image Super Resolution: An Empirical Study [50.15623093332659]
Implicit neural representation (INR) has become the standard approach for arbitrary-scale image super-resolution (ASSR)<n>We compare existing techniques across diverse settings and present aggregated performance results on multiple image quality metrics.<n>We examine a new loss function that penalizes intensity variations while preserving edges, textures, and finer details during training.
arXiv Detail & Related papers (2026-01-25T07:09:20Z)
LLM-Centric RAG with Multi-Granular Indexing and Confidence Constraints [5.2604064919135896]
This paper addresses the issues of insufficient coverage, unstable results, and limited reliability in retrieval-augmented generation under complex knowledge environments.<n>It proposes a confidence control method that integrates multi-granularity memory indexing with uncertainty estimation.<n>The results show that the method achieves superior performance over existing models in QA accuracy, retrieval recall, ranking quality, and factual consistency.
arXiv Detail & Related papers (2025-10-30T23:48:37Z)
Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning [28.15997901023315]
Recall is a novel adversarial framework designed to compromise the robustness of unlearned IGMs.<n>It consistently outperforms existing baselines in terms of adversarial effectiveness, computational efficiency, and semantic fidelity with the original prompt.<n>These findings reveal critical vulnerabilities in current unlearning mechanisms and underscore the need for more robust solutions.
arXiv Detail & Related papers (2025-07-09T02:59:01Z)
UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification [26.770271366177603]
We propose a robust approach named Uncertainty-Guided Graph model for multi-modal object ReID (UGG-ReID)<n>UGG-ReID is designed to mitigate noise interference and facilitate effective multi-modal fusion.<n> Experimental results show that the proposed method achieves excellent performance on all datasets.
arXiv Detail & Related papers (2025-07-07T03:41:08Z)
Leveraging Vision-Language Models to Select Trustworthy Super-Resolution Samples Generated by Diffusion Models [0.026861992804651083]
This paper introduces a robust framework for identifying the most trustworthy SR sample from a diffusion-generated set.<n>We propose a novel Trustworthiness Score (TWS) a hybrid metric that quantifies SR reliability based on semantic similarity.<n>By aligning outputs with human expectations and semantic correctness, this work sets a new benchmark for trustworthiness in generative SR.
arXiv Detail & Related papers (2025-06-25T21:00:44Z)
A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation [0.0]
This review consolidates and contextualizes foundational concepts in uncertainty modeling.<n>We identify challenges, such as strong assumptions in spatial aggregation and lack of standardized benchmarks.<n>We propose directions for advancing uncertainty-aware segmentation in deep learning.
arXiv Detail & Related papers (2024-11-25T13:26:09Z)
Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions. We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z)
PAIF: Perception-Aware Infrared-Visible Image Fusion for Attack-Tolerant Semantic Segmentation [50.556961575275345]
We propose a perception-aware fusion framework to promote segmentation robustness in adversarial scenes. We show that our scheme substantially enhances the robustness, with gains of 15.3% mIOU, compared with advanced competitors.
arXiv Detail & Related papers (2023-08-08T01:55:44Z)
Trusted Multi-View Classification with Dynamic Evidential Fusion [73.35990456162745]
We propose a novel multi-view classification algorithm, termed trusted multi-view classification (TMC) TMC provides a new paradigm for multi-view learning by dynamically integrating different views at an evidence level. Both theoretical and experimental results validate the effectiveness of the proposed model in accuracy, robustness and trustworthiness.
arXiv Detail & Related papers (2022-04-25T03:48:49Z)
Uncertainty-Aware Few-Shot Image Classification [118.72423376789062]
Few-shot image classification learns to recognize new categories from limited labelled data. We propose Uncertainty-Aware Few-Shot framework for image classification.
arXiv Detail & Related papers (2020-10-09T12:26:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.