Indicative Image Retrieval: Turning Blackbox Learning into Grey
- URL: http://arxiv.org/abs/2201.11898v1
- Date: Fri, 28 Jan 2022 02:21:09 GMT
- Title: Indicative Image Retrieval: Turning Blackbox Learning into Grey
- Authors: Xulu Zhang (1), Zhenqun Yang (2), Hao Tian (1), Qing Li (3), Xiaoyong
Wei (1 and 3) ((1) Sichuan University, (2) Chinese University of Hong Kong,
(3) Hong Kong Polytechnic Univeristy)
- Abstract summary: This paper revisits the importance of relevance/matching modeling in deep learning era.
It shows that it is possible to skip the representation learning and model the matching evidence directly.
It sets a new record of 97.77% on Oxford-5k (97.81% on Paris-6k) without extracting any deep features.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning became the game changer for image retrieval soon after it was
introduced. It promotes the feature extraction (by representation learning) as
the core of image retrieval, with the relevance/matching evaluation being
degenerated into simple similarity metrics. In many applications, we need the
matching evidence to be indicated rather than just have the ranked list (e.g.,
the locations of the target proteins/cells/lesions in medical images). It is
like the matched words need to be highlighted in search engines. However, this
is not easy to implement without explicit relevance/matching modeling. The deep
representation learning models are not feasible because of their blackbox
nature. In this paper, we revisit the importance of relevance/matching modeling
in deep learning era with an indicative retrieval setting. The study shows that
it is possible to skip the representation learning and model the matching
evidence directly. By removing the dependency on the pre-trained models, it has
avoided a lot of related issues (e.g., the domain gap between classification
and retrieval, the detail-diffusion caused by convolution, and so on). More
importantly, the study demonstrates that the matching can be explicitly modeled
and backtracked later for generating the matching evidence indications. It can
improve the explainability of deep inference. Our method obtains a best
performance in literature on both Oxford-5k and Paris-6k, and sets a new record
of 97.77% on Oxford-5k (97.81% on Paris-6k) without extracting any deep
features.
Related papers
- Classes Are Not Equal: An Empirical Study on Image Recognition Fairness [100.36114135663836]
We experimentally demonstrate that classes are not equal and the fairness issue is prevalent for image classification models across various datasets.
Our findings reveal that models tend to exhibit greater prediction biases for classes that are more challenging to recognize.
Data augmentation and representation learning algorithms improve overall performance by promoting fairness to some degree in image classification.
arXiv Detail & Related papers (2024-02-28T07:54:50Z) - Match me if you can: Semi-Supervised Semantic Correspondence Learning with Unpaired Images [76.47980643420375]
This paper builds on the hypothesis that there is an inherent data-hungry matter in learning semantic correspondences.
We demonstrate a simple machine annotator reliably enriches paired key points via machine supervision.
Our models surpass current state-of-the-art models on semantic correspondence learning benchmarks like SPair-71k, PF-PASCAL, and PF-WILLOW.
arXiv Detail & Related papers (2023-11-30T13:22:15Z) - Active Mining Sample Pair Semantics for Image-text Matching [6.370886833310617]
This paper proposes a novel image-text matching model, called Active Mining Sample Pair Semantics image-text matching model (AMSPS)
Compared with the single semantic learning mode of the commonsense learning model with triplet loss function, AMSPS is an active learning idea.
arXiv Detail & Related papers (2023-11-09T15:03:57Z) - HomE: Homography-Equivariant Video Representation Learning [62.89516761473129]
We propose a novel method for representation learning of multi-view videos.
Our method learns an implicit mapping between different views, culminating in a representation space that maintains the homography relationship between neighboring views.
On action classification, our method obtains 96.4% 3-fold accuracy on the UCF101 dataset, better than most state-of-the-art self-supervised learning methods.
arXiv Detail & Related papers (2023-06-02T15:37:43Z) - Ranking Loss and Sequestering Learning for Reducing Image Search Bias in
Histopathology [0.6595290783361959]
This paper proposes two novel ideas to improve image search performance.
First, we use a ranking loss function to guide feature extraction toward the matching-oriented nature of the search.
Second, we introduce the concept of sequestering learning to enhance the generalization of feature extraction.
arXiv Detail & Related papers (2023-04-15T03:38:09Z) - Attribute-Guided Multi-Level Attention Network for Fine-Grained Fashion Retrieval [27.751399400911932]
We introduce an attribute-guided multi-level attention network (AG-MAN) for fine-grained fashion retrieval.
Specifically, we first enhance the pre-trained feature extractor to capture multi-level image embedding.
Then, we propose a classification scheme where images with the same attribute, albeit with different values, are categorized into the same class.
arXiv Detail & Related papers (2022-12-27T05:28:38Z) - Spuriosity Rankings: Sorting Data to Measure and Mitigate Biases [62.54519787811138]
We present a simple but effective method to measure and mitigate model biases caused by reliance on spurious cues.
We rank images within their classes based on spuriosity, proxied via deep neural features of an interpretable network.
Our results suggest that model bias due to spurious feature reliance is influenced far more by what the model is trained on than how it is trained.
arXiv Detail & Related papers (2022-12-05T23:15:43Z) - Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles [66.15398165275926]
We propose a method that can automatically detect and ignore dataset-specific patterns, which we call dataset biases.
Our method trains a lower capacity model in an ensemble with a higher capacity model.
We show improvement in all settings, including a 10 point gain on the visual question answering dataset.
arXiv Detail & Related papers (2020-11-07T22:20:03Z) - Sharing Matters for Generalization in Deep Metric Learning [22.243744691711452]
This work investigates how to learn characteristics that separate between classes without the need for annotations or training data.
By formulating our approach as a novel triplet sampling strategy, it can be easily applied on top of recent ranking loss frameworks.
arXiv Detail & Related papers (2020-04-12T10:21:15Z) - Rethinking Few-Shot Image Classification: a Good Embedding Is All You
Need? [72.00712736992618]
We show that a simple baseline: learning a supervised or self-supervised representation on the meta-training set, outperforms state-of-the-art few-shot learning methods.
An additional boost can be achieved through the use of self-distillation.
We believe that our findings motivate a rethinking of few-shot image classification benchmarks and the associated role of meta-learning algorithms.
arXiv Detail & Related papers (2020-03-25T17:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.