Related papers: CrossATNet - A Novel Cross-Attention Based Framework for Sketch-Based Image Retrieval

CrossATNet - A Novel Cross-Attention Based Framework for Sketch-Based Image Retrieval

URL: http://arxiv.org/abs/2104.09918v1
Date: Tue, 20 Apr 2021 12:11:12 GMT
Title: CrossATNet - A Novel Cross-Attention Based Framework for Sketch-Based Image Retrieval
Authors: Ushasi Chaudhuri, Biplab Banerjee, Avik Bhattacharya, Mihai Datcu
Abstract summary: We propose a novel framework for cross-modal zero-shot learning (ZSL) in the context of sketch-based image retrieval (SBIR) While we define a cross-modal triplet loss to ensure the discriminative nature of the shared space, an innovative cross-modal attention learning strategy is also proposed to guide feature extraction from the image domain.
Score: 30.249581102239645
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a novel framework for cross-modal zero-shot learning (ZSL) in the context of sketch-based image retrieval (SBIR). Conventionally, the SBIR schema mainly considers simultaneous mappings among the two image views and the semantic side information. Therefore, it is desirable to consider fine-grained classes mainly in the sketch domain using highly discriminative and semantically rich feature space. However, the existing deep generative modeling-based SBIR approaches majorly focus on bridging the gaps between the seen and unseen classes by generating pseudo-unseen-class samples. Besides, violating the ZSL protocol by not utilizing any unseen-class information during training, such techniques do not pay explicit attention to modeling the discriminative nature of the shared space. Also, we note that learning a unified feature space for both the multi-view visual data is a tedious task considering the significant domain difference between sketches and color images. In this respect, as a remedy, we introduce a novel framework for zero-shot SBIR. While we define a cross-modal triplet loss to ensure the discriminative nature of the shared space, an innovative cross-modal attention learning strategy is also proposed to guide feature extraction from the image domain exploiting information from the respective sketch counterpart. In order to preserve the semantic consistency of the shared space, we consider a graph CNN-based module that propagates the semantic class topology to the shared space. To ensure an improved response time during inference, we further explore the possibility of representing the shared space in terms of hash codes. Experimental results obtained on the benchmark TU-Berlin and the Sketchy datasets confirm the superiority of CrossATNet in yielding state-of-the-art results.

Related papers

SketchYourSeg: Mask-Free Subjective Image Segmentation via Freehand Sketches [116.1810651297801]
SketchYourSeg establishes freehand sketches as a powerful query modality for subjective image segmentation. Our evaluations demonstrate superior performance over existing approaches across diverse benchmarks.
arXiv Detail & Related papers (2025-01-27T13:07:51Z)
Relation-Aware Meta-Learning for Zero-shot Sketch-Based Image Retrieval [89.15541654536544]
Sketch-based image retrieval (SBIR) relies on free-hand sketches to retrieve natural photos within the same class. To address this limitation, the task has evolved into Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) We propose a novel framework for ZS-SBIR that employs a pair-based relation-aware quadruplet loss to bridge feature gaps.
arXiv Detail & Related papers (2024-11-28T09:35:27Z)
Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based Image Retrieval [69.46139774646308]
This paper studies the problem of zero-shot sketch-based image retrieval (ZS-SBIR) It aims to use sketches from unseen categories as queries to match the images of the same category. We propose a novel Symmetrical Bidirectional Knowledge Alignment for zero-shot sketch-based image retrieval (SBKA)
arXiv Detail & Related papers (2023-12-16T04:50:34Z)
Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval [85.39613457282107]
Cross-domain nature of sketch-based image retrieval is challenging. We present an effective Adapt and Align'' approach to address the key challenges. Inspired by recent advances in image-text foundation models (e.g., CLIP) on zero-shot scenarios, we explicitly align the learned image embedding with a more semantic text embedding to achieve the desired knowledge transfer from seen to unseen classes.
arXiv Detail & Related papers (2023-05-09T03:10:15Z)
Zero-Shot Sketch Based Image Retrieval using Graph Transformer [18.00165431469872]
We propose a novel graph transformer based zero-shot sketch-based image retrieval (GTZSR) framework for solving ZS-SBIR tasks. To bridge the domain gap between the visual features, we propose minimizing the Wasserstein distance between images and sketches in a learned domain-shared space. We also propose a novel compatibility loss that further aligns the two visual domains by bridging the domain gap of one class with respect to the domain gap of all other classes in the training set.
arXiv Detail & Related papers (2022-01-25T09:02:39Z)
BDA-SketRet: Bi-Level Domain Adaptation for Zero-Shot SBIR [52.78253400327191]
BDA-SketRet is a novel framework performing a bi-level domain adaptation for aligning the spatial and semantic features of the visual data pairs. Experimental results on the extended Sketchy, TU-Berlin, and QuickDraw exhibit sharp improvements over the literature.
arXiv Detail & Related papers (2022-01-17T18:45:55Z)
Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval [66.37346493506737]
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a novel cross-modal retrieval task. We propose a novel Domain-Smoothing Network (DSN) for ZS-SBIR. Our approach notably outperforms the state-of-the-art methods in both Sketchy and TU-Berlin datasets.
arXiv Detail & Related papers (2021-06-22T14:58:08Z)
Towards Unsupervised Sketch-based Image Retrieval [126.77787336692802]
We introduce a novel framework that simultaneously performs unsupervised representation learning and sketch-photo domain alignment. Our framework achieves excellent performance in the new unsupervised setting, and performs comparably or better than state-of-the-art in the zero-shot setting.
arXiv Detail & Related papers (2021-05-18T02:38:22Z)
Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-based Image Retrieval [55.29233996427243]
Low-shot sketch-based image retrieval is an emerging task in computer vision. In this paper, we address any-shot, i.e. zero-shot and few-shot, sketch-based image retrieval (SBIR) tasks. For solving these tasks, we propose a semantically aligned cycle-consistent generative adversarial network (SEM-PCYC) Our results demonstrate a significant boost in any-shot performance over the state-of-the-art on the extended version of the Sketchy, TU-Berlin and QuickDraw datasets.
arXiv Detail & Related papers (2020-06-20T22:43:53Z)
Progressive Domain-Independent Feature Decomposition Network for Zero-Shot Sketch-Based Image Retrieval [15.955284712628444]
We propose a Progressive Domain-independent Feature Decomposition (PDFD) network for ZS-SBIR. Specifically, PDFD decomposes visual features into domain features and semantic ones, and then the semantic features are projected into common space as retrieval features for ZS-SBIR.
arXiv Detail & Related papers (2020-03-22T12:07:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.