Progressive Domain-Independent Feature Decomposition Network for
Zero-Shot Sketch-Based Image Retrieval
- URL: http://arxiv.org/abs/2003.09869v2
- Date: Fri, 6 May 2022 12:07:01 GMT
- Title: Progressive Domain-Independent Feature Decomposition Network for
Zero-Shot Sketch-Based Image Retrieval
- Authors: Xinxun Xu, Muli Yang, Yanhua Yang and Hao Wang
- Abstract summary: We propose a Progressive Domain-independent Feature Decomposition (PDFD) network for ZS-SBIR.
Specifically, PDFD decomposes visual features into domain features and semantic ones, and then the semantic features are projected into common space as retrieval features for ZS-SBIR.
- Score: 15.955284712628444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Zero-shot sketch-based image retrieval (ZS-SBIR) is a specific cross-modal
retrieval task for searching natural images given free-hand sketches under the
zero-shot scenario. Most existing methods solve this problem by simultaneously
projecting visual features and semantic supervision into a low-dimensional
common space for efficient retrieval. However, such low-dimensional projection
destroys the completeness of semantic knowledge in original semantic space, so
that it is unable to transfer useful knowledge well when learning semantic from
different modalities. Moreover, the domain information and semantic information
are entangled in visual features, which is not conducive for cross-modal
matching since it will hinder the reduction of domain gap between sketch and
image. In this paper, we propose a Progressive Domain-independent Feature
Decomposition (PDFD) network for ZS-SBIR. Specifically, with the supervision of
original semantic knowledge, PDFD decomposes visual features into domain
features and semantic ones, and then the semantic features are projected into
common space as retrieval features for ZS-SBIR. The progressive projection
strategy maintains strong semantic supervision. Besides, to guarantee the
retrieval features to capture clean and complete semantic information, the
cross-reconstruction loss is introduced to encourage that any combinations of
retrieval features and domain features can reconstruct the visual features.
Extensive experiments demonstrate the superiority of our PDFD over
state-of-the-art competitors.
Related papers
- Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - WAD-CMSN: Wasserstein Distance based Cross-Modal Semantic Network for
Zero-Shot Sketch-Based Image Retrieval [1.4180331276028657]
Zero-shot sketch-based image retrieval (ZSSBIR) is a popular studied branch of computer vision.
We propose a Wasserstein distance based cross-modal semantic network (WAD-CMSN) for ZSSBIR.
arXiv Detail & Related papers (2022-02-11T05:56:30Z) - Zero-Shot Sketch Based Image Retrieval using Graph Transformer [18.00165431469872]
We propose a novel graph transformer based zero-shot sketch-based image retrieval (GTZSR) framework for solving ZS-SBIR tasks.
To bridge the domain gap between the visual features, we propose minimizing the Wasserstein distance between images and sketches in a learned domain-shared space.
We also propose a novel compatibility loss that further aligns the two visual domains by bridging the domain gap of one class with respect to the domain gap of all other classes in the training set.
arXiv Detail & Related papers (2022-01-25T09:02:39Z) - BDA-SketRet: Bi-Level Domain Adaptation for Zero-Shot SBIR [52.78253400327191]
BDA-SketRet is a novel framework performing a bi-level domain adaptation for aligning the spatial and semantic features of the visual data pairs.
Experimental results on the extended Sketchy, TU-Berlin, and QuickDraw exhibit sharp improvements over the literature.
arXiv Detail & Related papers (2022-01-17T18:45:55Z) - Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval [66.37346493506737]
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a novel cross-modal retrieval task.
We propose a novel Domain-Smoothing Network (DSN) for ZS-SBIR.
Our approach notably outperforms the state-of-the-art methods in both Sketchy and TU-Berlin datasets.
arXiv Detail & Related papers (2021-06-22T14:58:08Z) - CrossATNet - A Novel Cross-Attention Based Framework for Sketch-Based
Image Retrieval [30.249581102239645]
We propose a novel framework for cross-modal zero-shot learning (ZSL) in the context of sketch-based image retrieval (SBIR)
While we define a cross-modal triplet loss to ensure the discriminative nature of the shared space, an innovative cross-modal attention learning strategy is also proposed to guide feature extraction from the image domain.
arXiv Detail & Related papers (2021-04-20T12:11:12Z) - Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-based
Image Retrieval [55.29233996427243]
Low-shot sketch-based image retrieval is an emerging task in computer vision.
In this paper, we address any-shot, i.e. zero-shot and few-shot, sketch-based image retrieval (SBIR) tasks.
For solving these tasks, we propose a semantically aligned cycle-consistent generative adversarial network (SEM-PCYC)
Our results demonstrate a significant boost in any-shot performance over the state-of-the-art on the extended version of the Sketchy, TU-Berlin and QuickDraw datasets.
arXiv Detail & Related papers (2020-06-20T22:43:53Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.