CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained
or Not
- URL: http://arxiv.org/abs/2303.13440v3
- Date: Tue, 28 Mar 2023 02:40:58 GMT
- Title: CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained
or Not
- Authors: Aneeshan Sain, Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Subhadeep
Koley, Tao Xiang, Yi-Zhe Song
- Abstract summary: We leverage CLIP for zero-shot sketch based image retrieval (ZS-SBIR)
We put forward novel designs on how best to achieve this synergy.
We observe significant performance gains in the region of 26.9% over previous state-of-the-art.
- Score: 109.69076457732632
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we leverage CLIP for zero-shot sketch based image retrieval
(ZS-SBIR). We are largely inspired by recent advances on foundation models and
the unparalleled generalisation ability they seem to offer, but for the first
time tailor it to benefit the sketch community. We put forward novel designs on
how best to achieve this synergy, for both the category setting and the
fine-grained setting ("all"). At the very core of our solution is a prompt
learning setup. First we show just via factoring in sketch-specific prompts, we
already have a category-level ZS-SBIR system that overshoots all prior arts, by
a large margin (24.8%) - a great testimony on studying the CLIP and ZS-SBIR
synergy. Moving onto the fine-grained setup is however trickier, and requires a
deeper dive into this synergy. For that, we come up with two specific designs
to tackle the fine-grained matching nature of the problem: (i) an additional
regularisation loss to ensure the relative separation between sketches and
photos is uniform across categories, which is not the case for the gold
standard standalone triplet loss, and (ii) a clever patch shuffling technique
to help establishing instance-level structural correspondences between
sketch-photo pairs. With these designs, we again observe significant
performance gains in the region of 26.9% over previous state-of-the-art. The
take-home message, if any, is the proposed CLIP and prompt learning paradigm
carries great promise in tackling other sketch-related tasks (not limited to
ZS-SBIR) where data scarcity remains a great challenge. Project page:
https://aneeshan95.github.io/Sketch_LVM/
Related papers
- Do Generalised Classifiers really work on Human Drawn Sketches? [122.11670266648771]
This paper marries large foundation models with human sketch understanding.
We demonstrate what this brings -- a paradigm shift in terms of generalised sketch representation learning.
Our framework surpasses popular sketch representation learning algorithms in both zero-shot and few-shot setups.
arXiv Detail & Related papers (2024-07-04T12:37:08Z) - Zero-Shot Everything Sketch-Based Image Retrieval, and in Explainable
Style [40.112168046676125]
This paper studies the problem of zero-short sketch-based image retrieval (ZS-SBIR)
Key innovation lies with the realization that such a cross-modal matching problem could be reduced to comparisons of groups of key local patches.
Experiments show ours indeed delivers superior performances across all ZS-SBIR settings.
arXiv Detail & Related papers (2023-03-25T03:52:32Z) - Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR [103.51937218213774]
This paper advances the fine-grained sketch-based image retrieval (FG-SBIR) literature by putting forward a strong baseline that overshoots prior state-of-the-arts by 11%.
We propose a simple modification to the standard triplet loss, that explicitly enforces separation amongst photos/sketch instances.
For (i) we employ an intra-modal triplet loss amongst sketches to bring sketches of the same instance closer from others, and one more amongst photos to push away different photo instances.
arXiv Detail & Related papers (2023-03-24T03:34:33Z) - Sketch3T: Test-Time Training for Zero-Shot SBIR [106.59164595640704]
Zero-shot sketch-based image retrieval typically asks for a trained model to be applied as is to unseen categories.
We extend ZS-SBIR asking it to transfer to both categories and sketch distributions.
Our key contribution is a test-time training paradigm that can adapt using just one sketch.
arXiv Detail & Related papers (2022-03-28T12:44:49Z) - Multi-granularity Association Learning Framework for on-the-fly
Fine-Grained Sketch-based Image Retrieval [7.797006835701767]
Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a particular photo in a given query sketch.
In this study, we aim to retrieve the target photo with the least number of strokes possible (incomplete sketch)
We propose a multi-granularity association learning framework that further optimize the embedding space of all incomplete sketches.
arXiv Detail & Related papers (2022-01-13T14:38:50Z) - ACNet: Approaching-and-Centralizing Network for Zero-Shot Sketch-Based
Image Retrieval [28.022137537238425]
We propose an textbfApproaching-and-textbfCentralizing textbfNetwork (termed textbfACNet'') to jointly optimize sketch-to-photo synthesis and the image retrieval.
The retrieval module guides the synthesis module to generate large amounts of diverse photo-like images which gradually approach the photo domain.
Our approach achieves state-of-the-art performance on two widely used ZS-SBIR datasets and surpasses previous methods by a large margin.
arXiv Detail & Related papers (2021-11-24T19:36:10Z) - More Photos are All You Need: Semi-Supervised Learning for Fine-Grained
Sketch Based Image Retrieval [112.1756171062067]
We introduce a novel semi-supervised framework for cross-modal retrieval.
At the centre of our design is a sequential photo-to-sketch generation model.
We also introduce a discriminator guided mechanism to guide against unfaithful generation.
arXiv Detail & Related papers (2021-03-25T17:27:08Z) - Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image
Retrieval [203.2520862597357]
Fine-grained sketch-based image retrieval (FG-SBIR) addresses the problem of retrieving a particular photo instance given a user's query sketch.
We reformulate the conventional FG-SBIR framework to tackle these challenges.
We propose an on-the-fly design that starts retrieving as soon as the user starts drawing.
arXiv Detail & Related papers (2020-02-24T15:36:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.