Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based
Image Retrieval
- URL: http://arxiv.org/abs/2312.10320v1
- Date: Sat, 16 Dec 2023 04:50:34 GMT
- Title: Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based
Image Retrieval
- Authors: Decheng Liu, Xu Luo, Chunlei Peng, Nannan Wang, Ruimin Hu, Xinbo Gao
- Abstract summary: This paper studies the problem of zero-shot sketch-based image retrieval (ZS-SBIR)
It aims to use sketches from unseen categories as queries to match the images of the same category.
We propose a novel Symmetrical Bidirectional Knowledge Alignment for zero-shot sketch-based image retrieval (SBKA)
- Score: 69.46139774646308
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the problem of zero-shot sketch-based image retrieval
(ZS-SBIR), which aims to use sketches from unseen categories as queries to
match the images of the same category. Due to the large cross-modality
discrepancy, ZS-SBIR is still a challenging task and mimics realistic zero-shot
scenarios. The key is to leverage transferable knowledge from the pre-trained
model to improve generalizability. Existing researchers often utilize the
simple fine-tuning training strategy or knowledge distillation from a teacher
model with fixed parameters, lacking efficient bidirectional knowledge
alignment between student and teacher models simultaneously for better
generalization. In this paper, we propose a novel Symmetrical Bidirectional
Knowledge Alignment for zero-shot sketch-based image retrieval (SBKA). The
symmetrical bidirectional knowledge alignment learning framework is designed to
effectively learn mutual rich discriminative information between teacher and
student models to achieve the goal of knowledge alignment. Instead of the
former one-to-one cross-modality matching in the testing stage, a one-to-many
cluster cross-modality matching method is proposed to leverage the inherent
relationship of intra-class images to reduce the adverse effects of the
existing modality gap. Experiments on several representative ZS-SBIR datasets
(Sketchy Ext dataset, TU-Berlin Ext dataset and QuickDraw Ext dataset) prove
the proposed algorithm can achieve superior performance compared with
state-of-the-art methods.
Related papers
- Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling [11.129453244307369]
FG-SBIR aims to minimize the distance between sketches and corresponding images in the embedding space.
We propose an effective approach to narrow the gap between the two domains.
It mainly facilitates unified mutual information sharing both intra- and inter-samples.
arXiv Detail & Related papers (2024-06-17T13:49:12Z) - Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers [120.49126407479717]
This paper explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR)
We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos.
arXiv Detail & Related papers (2024-03-12T00:02:03Z) - Modality-Aware Representation Learning for Zero-shot Sketch-based Image
Retrieval [10.568851068989973]
Zero-shot learning offers an efficient solution for a machine learning model to treat unseen categories.
We propose a novel framework that indirectly aligns sketches and photos by contrasting them through texts.
With an explicit modality encoding learned from data, our approach disentangles modality-agnostic semantics from modality-specific information.
arXiv Detail & Related papers (2024-01-10T00:39:03Z) - Data-Free Sketch-Based Image Retrieval [56.96186184599313]
We propose Data-Free (DF)-SBIR, where pre-trained, single-modality classification models have to be leveraged to learn cross-modal metric-space for retrieval without access to any training data.
We present a methodology for DF-SBIR, which can leverage knowledge from models independently trained to perform classification on photos and sketches.
Our method also achieves mAPs competitive with data-dependent approaches, all the while requiring no training data.
arXiv Detail & Related papers (2023-03-14T10:34:07Z) - S2-Net: Self-supervision Guided Feature Representation Learning for
Cross-Modality Images [0.0]
Cross-modality image pairs often fail to make the feature representations of correspondences as close as possible.
In this letter, we design a cross-modality feature representation learning network, S2-Net, which is based on the recently successful detect-and-describe pipeline.
We introduce self-supervised learning with a well-designed loss function to guide the training without discarding the original advantages.
arXiv Detail & Related papers (2022-03-28T08:47:49Z) - BDA-SketRet: Bi-Level Domain Adaptation for Zero-Shot SBIR [52.78253400327191]
BDA-SketRet is a novel framework performing a bi-level domain adaptation for aligning the spatial and semantic features of the visual data pairs.
Experimental results on the extended Sketchy, TU-Berlin, and QuickDraw exhibit sharp improvements over the literature.
arXiv Detail & Related papers (2022-01-17T18:45:55Z) - Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval [66.37346493506737]
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a novel cross-modal retrieval task.
We propose a novel Domain-Smoothing Network (DSN) for ZS-SBIR.
Our approach notably outperforms the state-of-the-art methods in both Sketchy and TU-Berlin datasets.
arXiv Detail & Related papers (2021-06-22T14:58:08Z) - Towards Unsupervised Sketch-based Image Retrieval [126.77787336692802]
We introduce a novel framework that simultaneously performs unsupervised representation learning and sketch-photo domain alignment.
Our framework achieves excellent performance in the new unsupervised setting, and performs comparably or better than state-of-the-art in the zero-shot setting.
arXiv Detail & Related papers (2021-05-18T02:38:22Z) - CrossATNet - A Novel Cross-Attention Based Framework for Sketch-Based
Image Retrieval [30.249581102239645]
We propose a novel framework for cross-modal zero-shot learning (ZSL) in the context of sketch-based image retrieval (SBIR)
While we define a cross-modal triplet loss to ensure the discriminative nature of the shared space, an innovative cross-modal attention learning strategy is also proposed to guide feature extraction from the image domain.
arXiv Detail & Related papers (2021-04-20T12:11:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.