Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval
- URL: http://arxiv.org/abs/2305.05144v3
- Date: Wed, 9 Aug 2023 14:12:34 GMT
- Title: Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval
- Authors: Shiyin Dong, Mingrui Zhu, Nannan Wang, Xinbo Gao
- Abstract summary: Cross-domain nature of sketch-based image retrieval is challenging.
We present an effective Adapt and Align'' approach to address the key challenges.
Inspired by recent advances in image-text foundation models (e.g., CLIP) on zero-shot scenarios, we explicitly align the learned image embedding with a more semantic text embedding to achieve the desired knowledge transfer from seen to unseen classes.
- Score: 85.39613457282107
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot sketch-based image retrieval (ZS-SBIR) is challenging due to the
cross-domain nature of sketches and photos, as well as the semantic gap between
seen and unseen image distributions. Previous methods fine-tune pre-trained
models with various side information and learning strategies to learn a compact
feature space that is shared between the sketch and photo domains and bridges
seen and unseen classes. However, these efforts are inadequate in adapting
domains and transferring knowledge from seen to unseen classes. In this paper,
we present an effective ``Adapt and Align'' approach to address the key
challenges. Specifically, we insert simple and lightweight domain adapters to
learn new abstract concepts of the sketch domain and improve cross-domain
representation capabilities. Inspired by recent advances in image-text
foundation models (e.g., CLIP) on zero-shot scenarios, we explicitly align the
learned image embedding with a more semantic text embedding to achieve the
desired knowledge transfer from seen to unseen classes. Extensive experiments
on three benchmark datasets and two popular backbones demonstrate the
superiority of our method in terms of retrieval accuracy and flexibility.
Related papers
- AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization [57.34659640776723]
We propose an end-to-end framework named AddressCLIP to solve the problem with more semantics.
We have built three datasets from Pittsburgh and San Francisco on different scales specifically for the IAL problem.
arXiv Detail & Related papers (2024-07-11T03:18:53Z) - Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based
Image Retrieval [69.46139774646308]
This paper studies the problem of zero-shot sketch-based image retrieval (ZS-SBIR)
It aims to use sketches from unseen categories as queries to match the images of the same category.
We propose a novel Symmetrical Bidirectional Knowledge Alignment for zero-shot sketch-based image retrieval (SBKA)
arXiv Detail & Related papers (2023-12-16T04:50:34Z) - Three-Stream Joint Network for Zero-Shot Sketch-Based Image Retrieval [15.191262439963221]
The Zero-Shot Sketch-based Image Retrieval (ZS-SBIR) is a challenging task because of the large domain gap between sketches and natural images.
We propose a novel Three-Stream Joint Training Network (3 JOIN) for the ZS-SBIR task.
arXiv Detail & Related papers (2022-04-12T09:52:17Z) - Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches [100.3966994660079]
We present a framework that infuses (i) gradient consensus for domain invariant learning, (ii) knowledge distillation for preserving old class information, and (iii) graph attention networks for message passing between old and novel classes.
We experimentally show that sketches are better class support than text in the context of FSCIL.
arXiv Detail & Related papers (2022-03-28T15:35:33Z) - Zero-Shot Sketch Based Image Retrieval using Graph Transformer [18.00165431469872]
We propose a novel graph transformer based zero-shot sketch-based image retrieval (GTZSR) framework for solving ZS-SBIR tasks.
To bridge the domain gap between the visual features, we propose minimizing the Wasserstein distance between images and sketches in a learned domain-shared space.
We also propose a novel compatibility loss that further aligns the two visual domains by bridging the domain gap of one class with respect to the domain gap of all other classes in the training set.
arXiv Detail & Related papers (2022-01-25T09:02:39Z) - ACNet: Approaching-and-Centralizing Network for Zero-Shot Sketch-Based
Image Retrieval [28.022137537238425]
We propose an textbfApproaching-and-textbfCentralizing textbfNetwork (termed textbfACNet'') to jointly optimize sketch-to-photo synthesis and the image retrieval.
The retrieval module guides the synthesis module to generate large amounts of diverse photo-like images which gradually approach the photo domain.
Our approach achieves state-of-the-art performance on two widely used ZS-SBIR datasets and surpasses previous methods by a large margin.
arXiv Detail & Related papers (2021-11-24T19:36:10Z) - Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval [66.37346493506737]
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a novel cross-modal retrieval task.
We propose a novel Domain-Smoothing Network (DSN) for ZS-SBIR.
Our approach notably outperforms the state-of-the-art methods in both Sketchy and TU-Berlin datasets.
arXiv Detail & Related papers (2021-06-22T14:58:08Z) - Towards Unsupervised Sketch-based Image Retrieval [126.77787336692802]
We introduce a novel framework that simultaneously performs unsupervised representation learning and sketch-photo domain alignment.
Our framework achieves excellent performance in the new unsupervised setting, and performs comparably or better than state-of-the-art in the zero-shot setting.
arXiv Detail & Related papers (2021-05-18T02:38:22Z) - CrossATNet - A Novel Cross-Attention Based Framework for Sketch-Based
Image Retrieval [30.249581102239645]
We propose a novel framework for cross-modal zero-shot learning (ZSL) in the context of sketch-based image retrieval (SBIR)
While we define a cross-modal triplet loss to ensure the discriminative nature of the shared space, an innovative cross-modal attention learning strategy is also proposed to guide feature extraction from the image domain.
arXiv Detail & Related papers (2021-04-20T12:11:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.