Related papers: WAD-CMSN: Wasserstein Distance based Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval

WAD-CMSN: Wasserstein Distance based Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval

URL: http://arxiv.org/abs/2202.05465v1
Date: Fri, 11 Feb 2022 05:56:30 GMT
Title: WAD-CMSN: Wasserstein Distance based Cross-Modal Semantic Network for Zero-Shot Sketch-Based Image Retrieval
Authors: Guanglong Xu, Zhensheng Hu, Jia Cai
Abstract summary: Zero-shot sketch-based image retrieval (ZSSBIR) is a popular studied branch of computer vision. We propose a Wasserstein distance based cross-modal semantic network (WAD-CMSN) for ZSSBIR.
Score: 1.4180331276028657
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Zero-shot sketch-based image retrieval (ZSSBIR), as a popular studied branch of computer vision, attracts wide attention recently. Unlike sketch-based image retrieval (SBIR), the main aim of ZSSBIR is to retrieve natural images given free hand-drawn sketches that may not appear during training. Previous approaches used semantic aligned sketch-image pairs or utilized memory expensive fusion layer for projecting the visual information to a low dimensional subspace, which ignores the significant heterogeneous cross-domain discrepancy between highly abstract sketch and relevant image. This may yield poor performance in the training phase. To tackle this issue and overcome this drawback, we propose a Wasserstein distance based cross-modal semantic network (WAD-CMSN) for ZSSBIR. Specifically, it first projects the visual information of each branch (sketch, image) to a common low dimensional semantic subspace via Wasserstein distance in an adversarial training manner. Furthermore, identity matching loss is employed to select useful features, which can not only capture complete semantic knowledge, but also alleviate the over-fitting phenomenon caused by the WAD-CMSN model. Experimental results on the challenging Sketchy (Extended) and TU-Berlin (Extended) datasets indicate the effectiveness of the proposed WAD-CMSN model over several competitors.

Related papers

Relation-Aware Meta-Learning for Zero-shot Sketch-Based Image Retrieval [89.15541654536544]
Sketch-based image retrieval (SBIR) relies on free-hand sketches to retrieve natural photos within the same class. To address this limitation, the task has evolved into Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) We propose a novel framework for ZS-SBIR that employs a pair-based relation-aware quadruplet loss to bridge feature gaps.
arXiv Detail & Related papers (2024-11-28T09:35:27Z)
Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based Image Retrieval [69.46139774646308]
This paper studies the problem of zero-shot sketch-based image retrieval (ZS-SBIR) It aims to use sketches from unseen categories as queries to match the images of the same category. We propose a novel Symmetrical Bidirectional Knowledge Alignment for zero-shot sketch-based image retrieval (SBKA)
arXiv Detail & Related papers (2023-12-16T04:50:34Z)
Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval [85.39613457282107]
Cross-domain nature of sketch-based image retrieval is challenging. We present an effective Adapt and Align'' approach to address the key challenges. Inspired by recent advances in image-text foundation models (e.g., CLIP) on zero-shot scenarios, we explicitly align the learned image embedding with a more semantic text embedding to achieve the desired knowledge transfer from seen to unseen classes.
arXiv Detail & Related papers (2023-05-09T03:10:15Z)
Three-Stream Joint Network for Zero-Shot Sketch-Based Image Retrieval [15.191262439963221]
The Zero-Shot Sketch-based Image Retrieval (ZS-SBIR) is a challenging task because of the large domain gap between sketches and natural images. We propose a novel Three-Stream Joint Training Network (3 JOIN) for the ZS-SBIR task.
arXiv Detail & Related papers (2022-04-12T09:52:17Z)
BDA-SketRet: Bi-Level Domain Adaptation for Zero-Shot SBIR [52.78253400327191]
BDA-SketRet is a novel framework performing a bi-level domain adaptation for aligning the spatial and semantic features of the visual data pairs. Experimental results on the extended Sketchy, TU-Berlin, and QuickDraw exhibit sharp improvements over the literature.
arXiv Detail & Related papers (2022-01-17T18:45:55Z)
Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval [66.37346493506737]
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a novel cross-modal retrieval task. We propose a novel Domain-Smoothing Network (DSN) for ZS-SBIR. Our approach notably outperforms the state-of-the-art methods in both Sketchy and TU-Berlin datasets.
arXiv Detail & Related papers (2021-06-22T14:58:08Z)
CrossATNet - A Novel Cross-Attention Based Framework for Sketch-Based Image Retrieval [30.249581102239645]
We propose a novel framework for cross-modal zero-shot learning (ZSL) in the context of sketch-based image retrieval (SBIR) While we define a cross-modal triplet loss to ensure the discriminative nature of the shared space, an innovative cross-modal attention learning strategy is also proposed to guide feature extraction from the image domain.
arXiv Detail & Related papers (2021-04-20T12:11:12Z)
Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-based Image Retrieval [55.29233996427243]
Low-shot sketch-based image retrieval is an emerging task in computer vision. In this paper, we address any-shot, i.e. zero-shot and few-shot, sketch-based image retrieval (SBIR) tasks. For solving these tasks, we propose a semantically aligned cycle-consistent generative adversarial network (SEM-PCYC) Our results demonstrate a significant boost in any-shot performance over the state-of-the-art on the extended version of the Sketchy, TU-Berlin and QuickDraw datasets.
arXiv Detail & Related papers (2020-06-20T22:43:53Z)
Progressive Domain-Independent Feature Decomposition Network for Zero-Shot Sketch-Based Image Retrieval [15.955284712628444]
We propose a Progressive Domain-independent Feature Decomposition (PDFD) network for ZS-SBIR. Specifically, PDFD decomposes visual features into domain features and semantic ones, and then the semantic features are projected into common space as retrieval features for ZS-SBIR.
arXiv Detail & Related papers (2020-03-22T12:07:23Z)
Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval [203.2520862597357]
Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a particular photo instance given a user's query sketch. We reformulate the conventional FG-SBIR framework to tackle these challenges. We propose an on-the-fly design that starts retrieving as soon as the user starts drawing.
arXiv Detail & Related papers (2020-02-24T15:36:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.