Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-based
Image Retrieval
- URL: http://arxiv.org/abs/2006.11397v1
- Date: Sat, 20 Jun 2020 22:43:53 GMT
- Title: Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-based
Image Retrieval
- Authors: Anjan Dutta and Zeynep Akata
- Abstract summary: Low-shot sketch-based image retrieval is an emerging task in computer vision.
In this paper, we address any-shot, i.e. zero-shot and few-shot, sketch-based image retrieval (SBIR) tasks.
For solving these tasks, we propose a semantically aligned cycle-consistent generative adversarial network (SEM-PCYC)
Our results demonstrate a significant boost in any-shot performance over the state-of-the-art on the extended version of the Sketchy, TU-Berlin and QuickDraw datasets.
- Score: 55.29233996427243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-shot sketch-based image retrieval is an emerging task in computer vision,
allowing to retrieve natural images relevant to hand-drawn sketch queries that
are rarely seen during the training phase. Related prior works either require
aligned sketch-image pairs that are costly to obtain or inefficient memory
fusion layer for mapping the visual information to a semantic space. In this
paper, we address any-shot, i.e. zero-shot and few-shot, sketch-based image
retrieval (SBIR) tasks, where we introduce the few-shot setting for SBIR. For
solving these tasks, we propose a semantically aligned paired cycle-consistent
generative adversarial network (SEM-PCYC) for any-shot SBIR, where each branch
of the generative adversarial network maps the visual information from sketch
and image to a common semantic space via adversarial training. Each of these
branches maintains cycle consistency that only requires supervision at the
category level, and avoids the need of aligned sketch-image pairs. A
classification criteria on the generators' outputs ensures the visual to
semantic space mapping to be class-specific. Furthermore, we propose to combine
textual and hierarchical side information via an auto-encoder that selects
discriminating side information within a same end-to-end model. Our results
demonstrate a significant boost in any-shot SBIR performance over the
state-of-the-art on the extended version of the challenging Sketchy, TU-Berlin
and QuickDraw datasets.
Related papers
- Query-guided Attention in Vision Transformers for Localizing Objects
Using a Single Sketch [17.63475613154152]
Given a crude hand-drawn sketch of an object, the goal is to localize all instances of the same object on the target image.
This problem proves difficult due to the abstract nature of hand-drawn sketches, variations in the style and quality of sketches, and the large domain gap existing between the sketches and the natural images.
We propose a sketch-guided vision transformer encoder that uses cross-attention after each block of the transformer-based image encoder to learn query-conditioned image features.
arXiv Detail & Related papers (2023-03-15T17:26:17Z) - BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid
Counterfactual Training for Robust Content-based Image Retrieval [61.803481264081036]
Content-Based Image Retrieval (CIR) aims to search for a target image by concurrently comprehending the composition of an example image and a complementary text.
We tackle this task by a novel underlinetextbfBottom-up crunderlinetextbfOss-modal underlinetextbfSemantic compounderlinetextbfSition (textbfBOSS) with Hybrid Counterfactual Training framework.
arXiv Detail & Related papers (2022-07-09T07:14:44Z) - Zero-Shot Sketch Based Image Retrieval using Graph Transformer [18.00165431469872]
We propose a novel graph transformer based zero-shot sketch-based image retrieval (GTZSR) framework for solving ZS-SBIR tasks.
To bridge the domain gap between the visual features, we propose minimizing the Wasserstein distance between images and sketches in a learned domain-shared space.
We also propose a novel compatibility loss that further aligns the two visual domains by bridging the domain gap of one class with respect to the domain gap of all other classes in the training set.
arXiv Detail & Related papers (2022-01-25T09:02:39Z) - BDA-SketRet: Bi-Level Domain Adaptation for Zero-Shot SBIR [52.78253400327191]
BDA-SketRet is a novel framework performing a bi-level domain adaptation for aligning the spatial and semantic features of the visual data pairs.
Experimental results on the extended Sketchy, TU-Berlin, and QuickDraw exhibit sharp improvements over the literature.
arXiv Detail & Related papers (2022-01-17T18:45:55Z) - Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval [66.37346493506737]
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a novel cross-modal retrieval task.
We propose a novel Domain-Smoothing Network (DSN) for ZS-SBIR.
Our approach notably outperforms the state-of-the-art methods in both Sketchy and TU-Berlin datasets.
arXiv Detail & Related papers (2021-06-22T14:58:08Z) - Compositional Sketch Search [91.84489055347585]
We present an algorithm for searching image collections using free-hand sketches.
We exploit drawings as a concise and intuitive representation for specifying entire scene compositions.
arXiv Detail & Related papers (2021-06-15T09:38:09Z) - CrossATNet - A Novel Cross-Attention Based Framework for Sketch-Based
Image Retrieval [30.249581102239645]
We propose a novel framework for cross-modal zero-shot learning (ZSL) in the context of sketch-based image retrieval (SBIR)
While we define a cross-modal triplet loss to ensure the discriminative nature of the shared space, an innovative cross-modal attention learning strategy is also proposed to guide feature extraction from the image domain.
arXiv Detail & Related papers (2021-04-20T12:11:12Z) - Cross-Modal Hierarchical Modelling for Fine-Grained Sketch Based Image
Retrieval [147.24102408745247]
We study a further trait of sketches that has been overlooked to date, that is, they are hierarchical in terms of the levels of detail.
In this paper, we design a novel network that is capable of cultivating sketch-specific hierarchies and exploiting them to match sketch with photo at corresponding hierarchical levels.
arXiv Detail & Related papers (2020-07-29T20:50:25Z) - Progressive Domain-Independent Feature Decomposition Network for
Zero-Shot Sketch-Based Image Retrieval [15.955284712628444]
We propose a Progressive Domain-independent Feature Decomposition (PDFD) network for ZS-SBIR.
Specifically, PDFD decomposes visual features into domain features and semantic ones, and then the semantic features are projected into common space as retrieval features for ZS-SBIR.
arXiv Detail & Related papers (2020-03-22T12:07:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.