Composed Image Retrieval with Text Feedback via Multi-grained
Uncertainty Regularization
- URL: http://arxiv.org/abs/2211.07394v6
- Date: Tue, 30 Jan 2024 05:37:24 GMT
- Title: Composed Image Retrieval with Text Feedback via Multi-grained
Uncertainty Regularization
- Authors: Yiyang Chen, Zhedong Zheng, Wei Ji, Leigang Qu, Tat-Seng Chua
- Abstract summary: We introduce a unified learning approach to simultaneously modeling the coarse- and fine-grained retrieval.
The proposed method has achieved +4.03%, +3.38%, and +2.40% Recall@50 accuracy over a strong baseline.
- Score: 73.04187954213471
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate composed image retrieval with text feedback. Users gradually
look for the target of interest by moving from coarse to fine-grained feedback.
However, existing methods merely focus on the latter, i.e., fine-grained
search, by harnessing positive and negative pairs during training. This
pair-based paradigm only considers the one-to-one distance between a pair of
specific points, which is not aligned with the one-to-many coarse-grained
retrieval process and compromises the recall rate. In an attempt to fill this
gap, we introduce a unified learning approach to simultaneously modeling the
coarse- and fine-grained retrieval by considering the multi-grained
uncertainty. The key idea underpinning the proposed method is to integrate
fine- and coarse-grained retrieval as matching data points with small and large
fluctuations, respectively. Specifically, our method contains two modules:
uncertainty modeling and uncertainty regularization. (1) The uncertainty
modeling simulates the multi-grained queries by introducing identically
distributed fluctuations in the feature space. (2) Based on the uncertainty
modeling, we further introduce uncertainty regularization to adapt the matching
objective according to the fluctuation range. Compared with existing methods,
the proposed strategy explicitly prevents the model from pushing away potential
candidates in the early stage, and thus improves the recall rate. On the three
public datasets, i.e., FashionIQ, Fashion200k, and Shoes, the proposed method
has achieved +4.03%, +3.38%, and +2.40% Recall@50 accuracy over a strong
baseline, respectively.
Related papers
- Cycle-Consistency Uncertainty Estimation for Visual Prompting based One-Shot Defect Segmentation [0.0]
Industrial defect detection traditionally relies on supervised learning models trained on fixed datasets of known defect types.
Recent advances in visual prompting offer a solution by allowing models to adaptively infer novel categories based on provided visual cues.
We propose a solution to estimate uncertainty of the visual prompting process by cycle-consistency.
arXiv Detail & Related papers (2024-09-21T02:25:32Z) - NubbleDrop: A Simple Way to Improve Matching Strategy for Prompted One-Shot Segmentation [2.2559617939136505]
We propose a simple and training-free method to enhance the validity and robustness of the matching strategy.
The core concept involves randomly dropping feature channels (setting them to zero) during the matching process.
This technique mimics discarding pathological nubbles, and it can be seamlessly applied to other similarity computing scenarios.
arXiv Detail & Related papers (2024-05-19T08:00:38Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Modeling Multimodal Aleatoric Uncertainty in Segmentation with Mixture
of Stochastic Expert [24.216869988183092]
We focus on capturing the data-inherent uncertainty (aka aleatoric uncertainty) in segmentation, typically when ambiguities exist in input images.
We propose a novel mixture of experts (MoSE) model, where each expert network estimates a distinct mode of aleatoric uncertainty.
We develop a Wasserstein-like loss that directly minimizes the distribution distance between the MoSE and ground truth annotations.
arXiv Detail & Related papers (2022-12-14T16:48:21Z) - Uncertainty Quantification of Collaborative Detection for Self-Driving [12.590332512097698]
Sharing information between connected and autonomous vehicles (CAVs) improves the performance of collaborative object detection for self-driving.
However, CAVs still have uncertainties on object detection due to practical challenges.
Our work is the first to estimate the uncertainty of collaborative object detection.
arXiv Detail & Related papers (2022-09-16T20:30:45Z) - Residual Overfit Method of Exploration [78.07532520582313]
We propose an approximate exploration methodology based on fitting only two point estimates, one tuned and one overfit.
The approach drives exploration towards actions where the overfit model exhibits the most overfitting compared to the tuned model.
We compare ROME against a set of established contextual bandit methods on three datasets and find it to be one of the best performing.
arXiv Detail & Related papers (2021-10-06T17:05:33Z) - Diverse Knowledge Distillation for End-to-End Person Search [81.4926655119318]
Person search aims to localize and identify a specific person from a gallery of images.
Recent methods can be categorized into two groups, i.e., two-step and end-to-end approaches.
We propose a simple yet strong end-to-end network with diverse knowledge distillation to break the bottleneck.
arXiv Detail & Related papers (2020-12-21T09:04:27Z) - Uncertainty-Aware Few-Shot Image Classification [118.72423376789062]
Few-shot image classification learns to recognize new categories from limited labelled data.
We propose Uncertainty-Aware Few-Shot framework for image classification.
arXiv Detail & Related papers (2020-10-09T12:26:27Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.