An Efficient Framework for Zero-Shot Sketch-Based Image Retrieval
- URL: http://arxiv.org/abs/2102.04016v1
- Date: Mon, 8 Feb 2021 06:10:37 GMT
- Title: An Efficient Framework for Zero-Shot Sketch-Based Image Retrieval
- Authors: Osman Tursun, Simon Denman, Sridha Sridharan, Ethan Goan and Clinton
Fookes
- Abstract summary: Zero-shot Sketch-based Image Retrieval (ZS-SBIR) has attracted the attention of the computer vision community due to it's real-world applications.
ZS-SBIR inherits the main challenges of multiple computer vision problems including content-based Image Retrieval (CBIR), zero-shot learning and domain adaptation.
- Score: 36.254157442709264
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Zero-shot Sketch-based Image Retrieval (ZS-SBIR) has attracted the
attention of the computer vision community due to it's real-world applications,
and the more realistic and challenging setting than found in SBIR. ZS-SBIR
inherits the main challenges of multiple computer vision problems including
content-based Image Retrieval (CBIR), zero-shot learning and domain adaptation.
The majority of previous studies using deep neural networks have achieved
improved results through either projecting sketch and images into a common
low-dimensional space or transferring knowledge from seen to unseen classes.
However, those approaches are trained with complex frameworks composed of
multiple deep convolutional neural networks (CNNs) and are dependent on
category-level word labels. This increases the requirements on training
resources and datasets. In comparison, we propose a simple and efficient
framework that does not require high computational training resources, and can
be trained on datasets without semantic categorical labels. Furthermore, at
training and inference stages our method only uses a single CNN. In this work,
a pre-trained ImageNet CNN (e.g., ResNet50) is fine-tuned with three proposed
learning objects: domain-aware quadruplet loss, semantic classification loss,
and semantic knowledge preservation loss. The domain-aware quadruplet and
semantic classification losses are introduced to learn discriminative, semantic
and domain invariant features through considering ZS-SBIR as object detection
and verification problem. ...
Related papers
- Which Backbone to Use: A Resource-efficient Domain Specific Comparison for Computer Vision [4.600687314645625]
Architectural backbones pre-trained on large datasets like ImageNet are commonly employed as feature extractors.
Our study systematically evaluates multiple lightweight, pre-trained CNN backbones under consistent training settings.
Our findings provide actionable insights into the performance trade-offs and effectiveness of different backbones.
arXiv Detail & Related papers (2024-06-09T02:01:25Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - Adapt and Align to Improve Zero-Shot Sketch-Based Image Retrieval [85.39613457282107]
Cross-domain nature of sketch-based image retrieval is challenging.
We present an effective Adapt and Align'' approach to address the key challenges.
Inspired by recent advances in image-text foundation models (e.g., CLIP) on zero-shot scenarios, we explicitly align the learned image embedding with a more semantic text embedding to achieve the desired knowledge transfer from seen to unseen classes.
arXiv Detail & Related papers (2023-05-09T03:10:15Z) - A Domain Decomposition-Based CNN-DNN Architecture for Model Parallel Training Applied to Image Recognition Problems [0.0]
A novel CNN-DNN architecture is proposed that naturally supports a model parallel training strategy.
The proposed approach can significantly accelerate the required training time compared to the global model.
Results show that the proposed approach can also help to improve the accuracy of the underlying classification problem.
arXiv Detail & Related papers (2023-02-13T18:06:59Z) - Visual Recognition with Deep Nearest Centroids [57.35144702563746]
We devise deep nearest centroids (DNC), a conceptually elegant yet surprisingly effective network for large-scale visual recognition.
Compared with parametric counterparts, DNC performs better on image classification (CIFAR-10, ImageNet) and greatly boots pixel recognition (ADE20K, Cityscapes)
arXiv Detail & Related papers (2022-09-15T15:47:31Z) - Agricultural Plantation Classification using Transfer Learning Approach
based on CNN [0.0]
The efficiency of hyper-spectral image recognition has increased significantly with deep learning.
CNN and Multi-Layer Perceptron(MLP) has demonstrated to be an excellent process of classifying images.
We propose using the method of transfer learning to decrease the training time and reduce the dependence on large labeled data-set.
arXiv Detail & Related papers (2022-06-19T14:43:31Z) - Exploiting the relationship between visual and textual features in
social networks for image classification with zero-shot deep learning [0.0]
In this work, we propose a classifier ensemble based on the transferable learning capabilities of the CLIP neural network architecture.
Our experiments, based on image classification tasks according to the labels of the Places dataset, are performed by first considering only the visual part.
Considering the associated texts to the images can help to improve the accuracy depending on the goal.
arXiv Detail & Related papers (2021-07-08T10:54:59Z) - Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval [66.37346493506737]
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a novel cross-modal retrieval task.
We propose a novel Domain-Smoothing Network (DSN) for ZS-SBIR.
Our approach notably outperforms the state-of-the-art methods in both Sketchy and TU-Berlin datasets.
arXiv Detail & Related papers (2021-06-22T14:58:08Z) - Joint Learning of Neural Transfer and Architecture Adaptation for Image
Recognition [77.95361323613147]
Current state-of-the-art visual recognition systems rely on pretraining a neural network on a large-scale dataset and finetuning the network weights on a smaller dataset.
In this work, we prove that dynamically adapting network architectures tailored for each domain task along with weight finetuning benefits in both efficiency and effectiveness.
Our method can be easily generalized to an unsupervised paradigm by replacing supernet training with self-supervised learning in the source domain tasks and performing linear evaluation in the downstream tasks.
arXiv Detail & Related papers (2021-03-31T08:15:17Z) - Complementing Representation Deficiency in Few-shot Image
Classification: A Meta-Learning Approach [27.350615059290348]
We propose a meta-learning approach with complemented representations network (MCRNet) for few-shot image classification.
In particular, we embed a latent space, where latent codes are reconstructed with extra representation information to complement the representation deficiency.
Our end-to-end framework achieves the state-of-the-art performance in image classification on three standard few-shot learning datasets.
arXiv Detail & Related papers (2020-07-21T13:25:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.