Sketch-an-Anchor: Sub-epoch Fast Model Adaptation for Zero-shot
Sketch-based Image Retrieval
- URL: http://arxiv.org/abs/2303.16769v1
- Date: Wed, 29 Mar 2023 15:00:02 GMT
- Title: Sketch-an-Anchor: Sub-epoch Fast Model Adaptation for Zero-shot
Sketch-based Image Retrieval
- Authors: Leo Sampaio Ferraz Ribeiro, Moacir Antonelli Ponti
- Abstract summary: Sketch-an-Anchor is a novel method to train state-of-the-art Zero-shot Image Retrieval (ZSSBIR) models in under an epoch.
Our fast-converging model keeps the single-domain performance while learning to extract similar representations from sketches.
- Score: 1.52292571922932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sketch-an-Anchor is a novel method to train state-of-the-art Zero-shot
Sketch-based Image Retrieval (ZSSBIR) models in under an epoch. Most studies
break down the problem of ZSSBIR into two parts: domain alignment between
images and sketches, inherited from SBIR, and generalization to unseen data,
inherent to the zero-shot protocol. We argue one of these problems can be
considerably simplified and re-frame the ZSSBIR problem around the
already-stellar yet underexplored Zero-shot Image-based Retrieval performance
of off-the-shelf models. Our fast-converging model keeps the single-domain
performance while learning to extract similar representations from sketches. To
this end we introduce our Semantic Anchors -- guiding embeddings learned from
word-based semantic spaces and features from off-the-shelf models -- and
combine them with our novel Anchored Contrastive Loss. Empirical evidence shows
we can achieve state-of-the-art performance on all benchmark datasets while
training for 100x less iterations than other methods.
Related papers
- Modality-Aware Representation Learning for Zero-shot Sketch-based Image
Retrieval [10.568851068989973]
Zero-shot learning offers an efficient solution for a machine learning model to treat unseen categories.
We propose a novel framework that indirectly aligns sketches and photos by contrasting them through texts.
With an explicit modality encoding learned from data, our approach disentangles modality-agnostic semantics from modality-specific information.
arXiv Detail & Related papers (2024-01-10T00:39:03Z) - Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based
Image Retrieval [69.46139774646308]
This paper studies the problem of zero-shot sketch-based image retrieval (ZS-SBIR)
It aims to use sketches from unseen categories as queries to match the images of the same category.
We propose a novel Symmetrical Bidirectional Knowledge Alignment for zero-shot sketch-based image retrieval (SBKA)
arXiv Detail & Related papers (2023-12-16T04:50:34Z) - Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval [85.39613457282107]
Cross-domain nature of sketch-based image retrieval is challenging.
We present an effective Adapt and Align'' approach to address the key challenges.
Inspired by recent advances in image-text foundation models (e.g., CLIP) on zero-shot scenarios, we explicitly align the learned image embedding with a more semantic text embedding to achieve the desired knowledge transfer from seen to unseen classes.
arXiv Detail & Related papers (2023-05-09T03:10:15Z) - Sketch3T: Test-Time Training for Zero-Shot SBIR [106.59164595640704]
Zero-shot sketch-based image retrieval typically asks for a trained model to be applied as is to unseen categories.
We extend ZS-SBIR asking it to transfer to both categories and sketch distributions.
Our key contribution is a test-time training paradigm that can adapt using just one sketch.
arXiv Detail & Related papers (2022-03-28T12:44:49Z) - Zero-Shot Sketch Based Image Retrieval using Graph Transformer [18.00165431469872]
We propose a novel graph transformer based zero-shot sketch-based image retrieval (GTZSR) framework for solving ZS-SBIR tasks.
To bridge the domain gap between the visual features, we propose minimizing the Wasserstein distance between images and sketches in a learned domain-shared space.
We also propose a novel compatibility loss that further aligns the two visual domains by bridging the domain gap of one class with respect to the domain gap of all other classes in the training set.
arXiv Detail & Related papers (2022-01-25T09:02:39Z) - BDA-SketRet: Bi-Level Domain Adaptation for Zero-Shot SBIR [52.78253400327191]
BDA-SketRet is a novel framework performing a bi-level domain adaptation for aligning the spatial and semantic features of the visual data pairs.
Experimental results on the extended Sketchy, TU-Berlin, and QuickDraw exhibit sharp improvements over the literature.
arXiv Detail & Related papers (2022-01-17T18:45:55Z) - Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval [66.37346493506737]
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a novel cross-modal retrieval task.
We propose a novel Domain-Smoothing Network (DSN) for ZS-SBIR.
Our approach notably outperforms the state-of-the-art methods in both Sketchy and TU-Berlin datasets.
arXiv Detail & Related papers (2021-06-22T14:58:08Z) - More Photos are All You Need: Semi-Supervised Learning for Fine-Grained
Sketch Based Image Retrieval [112.1756171062067]
We introduce a novel semi-supervised framework for cross-modal retrieval.
At the centre of our design is a sequential photo-to-sketch generation model.
We also introduce a discriminator guided mechanism to guide against unfaithful generation.
arXiv Detail & Related papers (2021-03-25T17:27:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.