Localizing Infinity-shaped fishes: Sketch-guided object localization in
the wild
- URL: http://arxiv.org/abs/2109.11874v1
- Date: Fri, 24 Sep 2021 10:39:43 GMT
- Title: Localizing Infinity-shaped fishes: Sketch-guided object localization in
the wild
- Authors: Pau Riba, Sounak Dey, Ali Furkan Biten and Josep Llados
- Abstract summary: This work investigates the problem of sketch-guided object localization.
Human sketches are used as queries to conduct the object localization in natural images.
We propose a sketch-conditioned DETR architecture which avoids a hard classification.
We experimentally demonstrate that our model and its variants significantly advance over previous state-of-the-art results.
- Score: 5.964436882344729
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This work investigates the problem of sketch-guided object localization
(SGOL), where human sketches are used as queries to conduct the object
localization in natural images. In this cross-modal setting, we first
contribute with a tough-to-beat baseline that without any specific SGOL
training is able to outperform the previous works on a fixed set of classes.
The baseline is useful to analyze the performance of SGOL approaches based on
available simple yet powerful methods. We advance prior arts by proposing a
sketch-conditioned DETR (DEtection TRansformer) architecture which avoids a
hard classification and alleviates the domain gap between sketches and images
to localize object instances. Although the main goal of SGOL is focused on
object detection, we explored its natural extension to sketch-guided instance
segmentation. This novel task allows to move towards identifying the objects at
pixel level, which is of key importance in several applications. We
experimentally demonstrate that our model and its variants significantly
advance over previous state-of-the-art results. All training and testing code
of our model will be released to facilitate future
research{{https://github.com/priba/sgol_wild}}.
Related papers
- Generative Location Modeling for Spatially Aware Object Insertion [35.62317512925592]
Generative models have become a powerful tool for image editing tasks, including object insertion.
In this paper, we focus on creating a location model dedicated to identifying realistic object locations.
Specifically, we train an autoregressive model that generates bounding box coordinates, conditioned on the background image and the desired object class.
This formulation allows to effectively handle sparse placement annotations and to incorporate implausible locations into a preference dataset by performing direct preference optimization.
arXiv Detail & Related papers (2024-10-17T14:00:41Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - What Can Human Sketches Do for Object Detection? [127.67444974452411]
Sketches are highly expressive, inherently capturing subjective and fine-grained visual cues.
A sketch-enabled object detection framework detects based on what textityou sketch -- textitthat zebra''
We show an intuitive synergy between foundation models (e.g., CLIP) and existing sketch models build for sketch-based image retrieval (SBIR)
In particular, we first perform independent on both sketch branches of an encoder model to build highly generalisable sketch and photo encoders.
arXiv Detail & Related papers (2023-03-27T12:33:23Z) - Query-guided Attention in Vision Transformers for Localizing Objects
Using a Single Sketch [17.63475613154152]
Given a crude hand-drawn sketch of an object, the goal is to localize all instances of the same object on the target image.
This problem proves difficult due to the abstract nature of hand-drawn sketches, variations in the style and quality of sketches, and the large domain gap existing between the sketches and the natural images.
We propose a sketch-guided vision transformer encoder that uses cross-attention after each block of the transformer-based image encoder to learn query-conditioned image features.
arXiv Detail & Related papers (2023-03-15T17:26:17Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z) - Synthesizing the Unseen for Zero-shot Object Detection [72.38031440014463]
We propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain.
We use a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them.
arXiv Detail & Related papers (2020-10-19T12:36:11Z) - Sketch-Guided Object Localization in Natural Images [16.982683600384277]
We introduce the novel problem of localizing all instances of an object (seen or unseen during training) in a natural image via sketch query.
We propose a novel cross-modal attention scheme that guides the region proposal network (RPN) to generate object proposals relevant to the sketch query.
Our method is effective with as little as a single sketch query.
arXiv Detail & Related papers (2020-08-14T19:35:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.