Three-Stream Joint Network for Zero-Shot Sketch-Based Image Retrieval
- URL: http://arxiv.org/abs/2204.05666v1
- Date: Tue, 12 Apr 2022 09:52:17 GMT
- Title: Three-Stream Joint Network for Zero-Shot Sketch-Based Image Retrieval
- Authors: Yu-Wei Zhan, Xin Luo, Yongxin Wang, Zhen-Duo Chen, Xin-Shun Xu
- Abstract summary: The Zero-Shot Sketch-based Image Retrieval (ZS-SBIR) is a challenging task because of the large domain gap between sketches and natural images.
We propose a novel Three-Stream Joint Training Network (3 JOIN) for the ZS-SBIR task.
- Score: 15.191262439963221
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Zero-Shot Sketch-based Image Retrieval (ZS-SBIR) is a challenging task
because of the large domain gap between sketches and natural images as well as
the semantic inconsistency between seen and unseen categories. Previous
literature bridges seen and unseen categories by semantic embedding, which
requires prior knowledge of the exact class names and additional extraction
efforts. And most works reduce domain gap by mapping sketches and natural
images into a common high-level space using constructed sketch-image pairs,
which ignore the unpaired information between images and sketches. To address
these issues, in this paper, we propose a novel Three-Stream Joint Training
Network (3JOIN) for the ZS-SBIR task. To narrow the domain differences between
sketches and images, we extract edge maps for natural images and treat them as
a bridge between images and sketches, which have similar content to images and
similar style to sketches. For exploiting a sufficient combination of sketches,
natural images, and edge maps, a novel three-stream joint training network is
proposed. In addition, we use a teacher network to extract the implicit
semantics of the samples without the aid of other semantics and transfer the
learned knowledge to unseen classes. Extensive experiments conducted on two
real-world datasets demonstrate the superiority of our proposed method.
Related papers
- Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval [85.39613457282107]
Cross-domain nature of sketch-based image retrieval is challenging.
We present an effective Adapt and Align'' approach to address the key challenges.
Inspired by recent advances in image-text foundation models (e.g., CLIP) on zero-shot scenarios, we explicitly align the learned image embedding with a more semantic text embedding to achieve the desired knowledge transfer from seen to unseen classes.
arXiv Detail & Related papers (2023-05-09T03:10:15Z) - WAD-CMSN: Wasserstein Distance based Cross-Modal Semantic Network for
Zero-Shot Sketch-Based Image Retrieval [1.4180331276028657]
Zero-shot sketch-based image retrieval (ZSSBIR) is a popular studied branch of computer vision.
We propose a Wasserstein distance based cross-modal semantic network (WAD-CMSN) for ZSSBIR.
arXiv Detail & Related papers (2022-02-11T05:56:30Z) - Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval [66.37346493506737]
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a novel cross-modal retrieval task.
We propose a novel Domain-Smoothing Network (DSN) for ZS-SBIR.
Our approach notably outperforms the state-of-the-art methods in both Sketchy and TU-Berlin datasets.
arXiv Detail & Related papers (2021-06-22T14:58:08Z) - CrossATNet - A Novel Cross-Attention Based Framework for Sketch-Based
Image Retrieval [30.249581102239645]
We propose a novel framework for cross-modal zero-shot learning (ZSL) in the context of sketch-based image retrieval (SBIR)
While we define a cross-modal triplet loss to ensure the discriminative nature of the shared space, an innovative cross-modal attention learning strategy is also proposed to guide feature extraction from the image domain.
arXiv Detail & Related papers (2021-04-20T12:11:12Z) - Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image
Retrieval [147.24102408745247]
We study a further trait of sketches that has been overlooked to date, that is, they are hierarchical in terms of the levels of detail.
In this paper, we design a novel network that is capable of cultivating sketch-specific hierarchies and exploiting them to match sketch with photo at corresponding hierarchical levels.
arXiv Detail & Related papers (2020-07-29T20:50:25Z) - Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-based
Image Retrieval [55.29233996427243]
Low-shot sketch-based image retrieval is an emerging task in computer vision.
In this paper, we address any-shot, i.e. zero-shot and few-shot, sketch-based image retrieval (SBIR) tasks.
For solving these tasks, we propose a semantically aligned cycle-consistent generative adversarial network (SEM-PCYC)
Our results demonstrate a significant boost in any-shot performance over the state-of-the-art on the extended version of the Sketchy, TU-Berlin and QuickDraw datasets.
arXiv Detail & Related papers (2020-06-20T22:43:53Z) - Deep Self-Supervised Representation Learning for Free-Hand Sketch [51.101565480583304]
We tackle the problem of self-supervised representation learning for free-hand sketches.
Key for the success of our self-supervised learning paradigm lies with our sketch-specific designs.
We show that the proposed approach outperforms the state-of-the-art unsupervised representation learning methods.
arXiv Detail & Related papers (2020-02-03T16:28:29Z) - SketchDesc: Learning Local Sketch Descriptors for Multi-view
Correspondence [68.63311821718416]
We study the problem of multi-view sketch correspondence, where we take as input multiple freehand sketches with different views of the same object.
This problem is challenging since the visual features of corresponding points at different views can be very different.
We take a deep learning approach and learn a novel local sketch descriptor from data.
arXiv Detail & Related papers (2020-01-16T11:31:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.