Related papers: Relation-Aware Meta-Learning for Zero-shot Sketch-Based Image Retrieval

Relation-Aware Meta-Learning for Zero-shot Sketch-Based Image Retrieval

URL: http://arxiv.org/abs/2412.00120v1
Date: Thu, 28 Nov 2024 09:35:27 GMT
Title: Relation-Aware Meta-Learning for Zero-shot Sketch-Based Image Retrieval
Authors: Yang Liu, Jiale Du, Xinbo Gao, Jungong Han,
Abstract summary: Sketch-based image retrieval (SBIR) relies on free-hand sketches to retrieve natural photos within the same class.<n>To address this limitation, the task has evolved into Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR)<n>We propose a novel framework for ZS-SBIR that employs a pair-based relation-aware quadruplet loss to bridge feature gaps.
Score: 89.15541654536544
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sketch-based image retrieval (SBIR) relies on free-hand sketches to retrieve natural photos within the same class. However, its practical application is limited by its inability to retrieve classes absent from the training set. To address this limitation, the task has evolved into Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR), where model performance is evaluated on unseen categories. Traditional SBIR primarily focuses on narrowing the domain gap between photo and sketch modalities. However, in the zero-shot setting, the model not only needs to address this cross-modal discrepancy but also requires a strong generalization capability to transfer knowledge to unseen categories. To this end, we propose a novel framework for ZS-SBIR that employs a pair-based relation-aware quadruplet loss to bridge feature gaps. By incorporating two negative samples from different modalities, the approach prevents positive features from becoming disproportionately distant from one modality while remaining close to another, thus enhancing inter-class separability. We also propose a Relation-Aware Meta-Learning Network (RAMLN) to obtain the margin, a hyper-parameter of cross-modal quadruplet loss, to improve the generalization ability of the model. RAMLN leverages external memory to store feature information, which it utilizes to assign optimal margin values. Experimental results obtained on the extended Sketchy and TU-Berlin datasets show a sharp improvement over existing state-of-the-art methods in ZS-SBIR.

Related papers

Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling [11.129453244307369]
FG-SBIR aims to minimize the distance between sketches and corresponding images in the embedding space. We propose an effective approach to narrow the gap between the two domains. It mainly facilitates unified mutual information sharing both intra- and inter-samples.
arXiv Detail & Related papers (2024-06-17T13:49:12Z)
Modality-Aware Representation Learning for Zero-shot Sketch-based Image Retrieval [10.568851068989973]
Zero-shot learning offers an efficient solution for a machine learning model to treat unseen categories. We propose a novel framework that indirectly aligns sketches and photos by contrasting them through texts. With an explicit modality encoding learned from data, our approach disentangles modality-agnostic semantics from modality-specific information.
arXiv Detail & Related papers (2024-01-10T00:39:03Z)
Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based Image Retrieval [69.46139774646308]
This paper studies the problem of zero-shot sketch-based image retrieval (ZS-SBIR) It aims to use sketches from unseen categories as queries to match the images of the same category. We propose a novel Symmetrical Bidirectional Knowledge Alignment for zero-shot sketch-based image retrieval (SBKA)
arXiv Detail & Related papers (2023-12-16T04:50:34Z)
Active Learning for Fine-Grained Sketch-Based Image Retrieval [1.994307489466967]
The ability to retrieve a photo by mere free-hand sketching highlights the immense potential of Fine-grained sketch-based image retrieval (FG-SBIR) We propose a novel active learning sampling technique that drastically minimises the need for drawing photo sketches.
arXiv Detail & Related papers (2023-09-15T20:07:14Z)
BDA-SketRet: Bi-Level Domain Adaptation for Zero-Shot SBIR [52.78253400327191]
BDA-SketRet is a novel framework performing a bi-level domain adaptation for aligning the spatial and semantic features of the visual data pairs. Experimental results on the extended Sketchy, TU-Berlin, and QuickDraw exhibit sharp improvements over the literature.
arXiv Detail & Related papers (2022-01-17T18:45:55Z)
ACNet: Approaching-and-Centralizing Network for Zero-Shot Sketch-Based Image Retrieval [28.022137537238425]
We propose an textbfApproaching-and-textbfCentralizing textbfNetwork (termed textbfACNet'') to jointly optimize sketch-to-photo synthesis and the image retrieval. The retrieval module guides the synthesis module to generate large amounts of diverse photo-like images which gradually approach the photo domain. Our approach achieves state-of-the-art performance on two widely used ZS-SBIR datasets and surpasses previous methods by a large margin.
arXiv Detail & Related papers (2021-11-24T19:36:10Z)
Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval [66.37346493506737]
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a novel cross-modal retrieval task. We propose a novel Domain-Smoothing Network (DSN) for ZS-SBIR. Our approach notably outperforms the state-of-the-art methods in both Sketchy and TU-Berlin datasets.
arXiv Detail & Related papers (2021-06-22T14:58:08Z)
Towards Unsupervised Sketch-based Image Retrieval [126.77787336692802]
We introduce a novel framework that simultaneously performs unsupervised representation learning and sketch-photo domain alignment. Our framework achieves excellent performance in the new unsupervised setting, and performs comparably or better than state-of-the-art in the zero-shot setting.
arXiv Detail & Related papers (2021-05-18T02:38:22Z)
More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval [112.1756171062067]
We introduce a novel semi-supervised framework for cross-modal retrieval. At the centre of our design is a sequential photo-to-sketch generation model. We also introduce a discriminator guided mechanism to guide against unfaithful generation.
arXiv Detail & Related papers (2021-03-25T17:27:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.