Modality-Aware Representation Learning for Zero-shot Sketch-based Image
Retrieval
- URL: http://arxiv.org/abs/2401.04860v1
- Date: Wed, 10 Jan 2024 00:39:03 GMT
- Title: Modality-Aware Representation Learning for Zero-shot Sketch-based Image
Retrieval
- Authors: Eunyi Lyou, Doyeon Lee, Jooeun Kim, Joonseok Lee
- Abstract summary: Zero-shot learning offers an efficient solution for a machine learning model to treat unseen categories.
We propose a novel framework that indirectly aligns sketches and photos by contrasting them through texts.
With an explicit modality encoding learned from data, our approach disentangles modality-agnostic semantics from modality-specific information.
- Score: 10.568851068989973
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Zero-shot learning offers an efficient solution for a machine learning model
to treat unseen categories, avoiding exhaustive data collection. Zero-shot
Sketch-based Image Retrieval (ZS-SBIR) simulates real-world scenarios where it
is hard and costly to collect paired sketch-photo samples. We propose a novel
framework that indirectly aligns sketches and photos by contrasting them
through texts, removing the necessity of access to sketch-photo pairs. With an
explicit modality encoding learned from data, our approach disentangles
modality-agnostic semantics from modality-specific information, bridging the
modality gap and enabling effective cross-modal content retrieval within a
joint latent space. From comprehensive experiments, we verify the efficacy of
the proposed model on ZS-SBIR, and it can be also applied to generalized and
fine-grained settings.
Related papers
- Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based
Image Retrieval [69.46139774646308]
This paper studies the problem of zero-shot sketch-based image retrieval (ZS-SBIR)
It aims to use sketches from unseen categories as queries to match the images of the same category.
We propose a novel Symmetrical Bidirectional Knowledge Alignment for zero-shot sketch-based image retrieval (SBKA)
arXiv Detail & Related papers (2023-12-16T04:50:34Z) - Sketch-an-Anchor: Sub-epoch Fast Model Adaptation for Zero-shot
Sketch-based Image Retrieval [1.52292571922932]
Sketch-an-Anchor is a novel method to train state-of-the-art Zero-shot Image Retrieval (ZSSBIR) models in under an epoch.
Our fast-converging model keeps the single-domain performance while learning to extract similar representations from sketches.
arXiv Detail & Related papers (2023-03-29T15:00:02Z) - Data-Free Sketch-Based Image Retrieval [56.96186184599313]
We propose Data-Free (DF)-SBIR, where pre-trained, single-modality classification models have to be leveraged to learn cross-modal metric-space for retrieval without access to any training data.
We present a methodology for DF-SBIR, which can leverage knowledge from models independently trained to perform classification on photos and sketches.
Our method also achieves mAPs competitive with data-dependent approaches, all the while requiring no training data.
arXiv Detail & Related papers (2023-03-14T10:34:07Z) - ACNet: Approaching-and-Centralizing Network for Zero-Shot Sketch-Based
Image Retrieval [28.022137537238425]
We propose an textbfApproaching-and-textbfCentralizing textbfNetwork (termed textbfACNet'') to jointly optimize sketch-to-photo synthesis and the image retrieval.
The retrieval module guides the synthesis module to generate large amounts of diverse photo-like images which gradually approach the photo domain.
Our approach achieves state-of-the-art performance on two widely used ZS-SBIR datasets and surpasses previous methods by a large margin.
arXiv Detail & Related papers (2021-11-24T19:36:10Z) - Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval [66.37346493506737]
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a novel cross-modal retrieval task.
We propose a novel Domain-Smoothing Network (DSN) for ZS-SBIR.
Our approach notably outperforms the state-of-the-art methods in both Sketchy and TU-Berlin datasets.
arXiv Detail & Related papers (2021-06-22T14:58:08Z) - Towards Unsupervised Sketch-based Image Retrieval [126.77787336692802]
We introduce a novel framework that simultaneously performs unsupervised representation learning and sketch-photo domain alignment.
Our framework achieves excellent performance in the new unsupervised setting, and performs comparably or better than state-of-the-art in the zero-shot setting.
arXiv Detail & Related papers (2021-05-18T02:38:22Z) - More Photos are All You Need: Semi-Supervised Learning for Fine-Grained
Sketch Based Image Retrieval [112.1756171062067]
We introduce a novel semi-supervised framework for cross-modal retrieval.
At the centre of our design is a sequential photo-to-sketch generation model.
We also introduce a discriminator guided mechanism to guide against unfaithful generation.
arXiv Detail & Related papers (2021-03-25T17:27:08Z) - A Zero-Shot Sketch-based Inter-Modal Object Retrieval Scheme for Remote
Sensing Images [26.48516754642218]
We propose a novel inter-modal triplet-based zero-shot retrieval scheme utilizing a sketch-based representation of RS data.
The proposed scheme performs efficiently even when the sketch representations are marginally prototypical of the image.
arXiv Detail & Related papers (2020-08-12T10:51:24Z) - Semantically Tied Paired Cycle Consistency for Any-Shot Sketch-based
Image Retrieval [55.29233996427243]
Low-shot sketch-based image retrieval is an emerging task in computer vision.
In this paper, we address any-shot, i.e. zero-shot and few-shot, sketch-based image retrieval (SBIR) tasks.
For solving these tasks, we propose a semantically aligned cycle-consistent generative adversarial network (SEM-PCYC)
Our results demonstrate a significant boost in any-shot performance over the state-of-the-art on the extended version of the Sketchy, TU-Berlin and QuickDraw datasets.
arXiv Detail & Related papers (2020-06-20T22:43:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.