Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR
- URL: http://arxiv.org/abs/2303.13779v1
- Date: Fri, 24 Mar 2023 03:34:33 GMT
- Title: Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR
- Authors: Aneeshan Sain, Ayan Kumar Bhunia, Subhadeep Koley, Pinaki Nath
Chowdhury, Soumitri Chattopadhyay, Tao Xiang, Yi-Zhe Song
- Abstract summary: This paper advances the fine-grained sketch-based image retrieval (FG-SBIR) literature by putting forward a strong baseline that overshoots prior state-of-the-arts by 11%.
We propose a simple modification to the standard triplet loss, that explicitly enforces separation amongst photos/sketch instances.
For (i) we employ an intra-modal triplet loss amongst sketches to bring sketches of the same instance closer from others, and one more amongst photos to push away different photo instances.
- Score: 103.51937218213774
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper advances the fine-grained sketch-based image retrieval (FG-SBIR)
literature by putting forward a strong baseline that overshoots prior
state-of-the-arts by ~11%. This is not via complicated design though, but by
addressing two critical issues facing the community (i) the gold standard
triplet loss does not enforce holistic latent space geometry, and (ii) there
are never enough sketches to train a high accuracy model. For the former, we
propose a simple modification to the standard triplet loss, that explicitly
enforces separation amongst photos/sketch instances. For the latter, we put
forward a novel knowledge distillation module can leverage photo data for model
training. Both modules are then plugged into a novel plug-n-playable training
paradigm that allows for more stable training. More specifically, for (i) we
employ an intra-modal triplet loss amongst sketches to bring sketches of the
same instance closer from others, and one more amongst photos to push away
different photo instances while bringing closer a structurally augmented
version of the same photo (offering a gain of ~4-6%). To tackle (ii), we first
pre-train a teacher on the large set of unlabelled photos over the
aforementioned intra-modal photo triplet loss. Then we distill the contextual
similarity present amongst the instances in the teacher's embedding space to
that in the student's embedding space, by matching the distribution over
inter-feature distances of respective samples in both embedding spaces
(delivering a further gain of ~4-5%). Apart from outperforming prior arts
significantly, our model also yields satisfactory results on generalising to
new classes. Project page: https://aneeshan95.github.io/Sketch_PVT/
Related papers
- Exploring the Untouched Sweeps for Conflict-Aware 3D Segmentation Pretraining [41.145598142457686]
LiDAR-camera 3D representation pretraining has shown significant promise for 3D perception tasks and related applications.
We propose a novel Vision-Foundation-Model-driven sample exploring module to meticulously select LiDAR-Image pairs from unexplored frames.
Our method consistently outperforms existing state-of-the-art pretraining frameworks across three major public autonomous driving datasets.
arXiv Detail & Related papers (2024-07-10T08:46:29Z) - Direct Consistency Optimization for Compositional Text-to-Image
Personalization [73.94505688626651]
Text-to-image (T2I) diffusion models, when fine-tuned on a few personal images, are able to generate visuals with a high degree of consistency.
We propose to fine-tune the T2I model by maximizing consistency to reference images, while penalizing the deviation from the pretrained model.
arXiv Detail & Related papers (2024-02-19T09:52:41Z) - Active Learning for Fine-Grained Sketch-Based Image Retrieval [1.994307489466967]
The ability to retrieve a photo by mere free-hand sketching highlights the immense potential of Fine-grained sketch-based image retrieval (FG-SBIR)
We propose a novel active learning sampling technique that drastically minimises the need for drawing photo sketches.
arXiv Detail & Related papers (2023-09-15T20:07:14Z) - A Recipe for Efficient SBIR Models: Combining Relative Triplet Loss with
Batch Normalization and Knowledge Distillation [3.364554138758565]
Sketch-Based Image Retrieval (SBIR) is a crucial task in multimedia retrieval, where the goal is to retrieve a set of images that match a given sketch query.
We introduce a Relative Triplet Loss (RTL), an adapted triplet loss to overcome limitations through loss weighting based on anchors similarity.
We propose a straightforward approach to train small models efficiently with a marginal loss of accuracy through knowledge distillation.
arXiv Detail & Related papers (2023-05-30T12:41:04Z) - CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained
or Not [109.69076457732632]
We leverage CLIP for zero-shot sketch based image retrieval (ZS-SBIR)
We put forward novel designs on how best to achieve this synergy.
We observe significant performance gains in the region of 26.9% over previous state-of-the-art.
arXiv Detail & Related papers (2023-03-23T17:02:00Z) - Adaptive Fine-Grained Sketch-Based Image Retrieval [100.90633284767205]
Recent focus on Fine-Grained Sketch-Based Image Retrieval has shifted towards generalising a model to new categories.
In real-world applications, a trained FG-SBIR model is often applied to both new categories and different human sketchers.
We introduce a novel model-agnostic meta-learning (MAML) based framework with several key modifications.
arXiv Detail & Related papers (2022-07-04T21:07:20Z) - Sketch3T: Test-Time Training for Zero-Shot SBIR [106.59164595640704]
Zero-shot sketch-based image retrieval typically asks for a trained model to be applied as is to unseen categories.
We extend ZS-SBIR asking it to transfer to both categories and sketch distributions.
Our key contribution is a test-time training paradigm that can adapt using just one sketch.
arXiv Detail & Related papers (2022-03-28T12:44:49Z) - More Photos are All You Need: Semi-Supervised Learning for Fine-Grained
Sketch Based Image Retrieval [112.1756171062067]
We introduce a novel semi-supervised framework for cross-modal retrieval.
At the centre of our design is a sequential photo-to-sketch generation model.
We also introduce a discriminator guided mechanism to guide against unfaithful generation.
arXiv Detail & Related papers (2021-03-25T17:27:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.