On-the-fly Object Detection using StyleGAN with CLIP Guidance
- URL: http://arxiv.org/abs/2210.16742v1
- Date: Sun, 30 Oct 2022 04:43:01 GMT
- Title: On-the-fly Object Detection using StyleGAN with CLIP Guidance
- Authors: Yuzhe Lu, Shusen Liu, Jayaraman J. Thiagarajan, Wesam Sakla, Rushil
Anirudh
- Abstract summary: We present a fully automated framework for building object detectors on satellite imagery without requiring any human intervention.
We achieve this by leveraging the combined power of modern generative models (e.g., StyleGAN) and recent advances in multi-modal learning (e.g., CLIP)
- Score: 28.25720358443378
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a fully automated framework for building object detectors on
satellite imagery without requiring any human annotation or intervention. We
achieve this by leveraging the combined power of modern generative models
(e.g., StyleGAN) and recent advances in multi-modal learning (e.g., CLIP).
While deep generative models effectively encode the key semantics pertinent to
a data distribution, this information is not immediately accessible for
downstream tasks, such as object detection. In this work, we exploit CLIP's
ability to associate image features with text descriptions to identify neurons
in the generator network, which are subsequently used to build detectors
on-the-fly.
Related papers
- An Application-Agnostic Automatic Target Recognition System Using Vision Language Models [32.858386851006316]
We present a novel Automatic Target Recognition (ATR) system using open-vocabulary object detection and classification models.
A primary advantage of this approach is that target classes can be defined just before runtime by a non-technical end user.
Nuances in the desired targets can be expressed in natural language, which is useful for unique targets with little or no training data.
arXiv Detail & Related papers (2024-11-05T20:16:15Z) - Spatio-Temporal Context Prompting for Zero-Shot Action Detection [13.22912547389941]
We propose a method which can effectively leverage the rich knowledge of visual-language models to perform Person-Context Interaction.
To address the challenge of recognizing distinct actions by multiple people at the same timestamp, we design the Interest Token Spotting mechanism.
Our method achieves superior results compared to previous approaches and can be further extended to multi-action videos.
arXiv Detail & Related papers (2024-08-28T17:59:05Z) - Neural Clustering based Visual Representation Learning [61.72646814537163]
Clustering is one of the most classic approaches in machine learning and data analysis.
We propose feature extraction with clustering (FEC), which views feature extraction as a process of selecting representatives from data.
FEC alternates between grouping pixels into individual clusters to abstract representatives and updating the deep features of pixels with current representatives.
arXiv Detail & Related papers (2024-03-26T06:04:50Z) - Exploring Robust Features for Few-Shot Object Detection in Satellite
Imagery [17.156864650143678]
We develop a few-shot object detector based on a traditional two-stage architecture.
A large-scale pre-trained model is used to build class-reference embeddings or prototypes.
We perform evaluations on two remote sensing datasets containing challenging and rare objects.
arXiv Detail & Related papers (2024-03-08T15:20:27Z) - InstaGen: Enhancing Object Detection by Training on Synthetic Dataset [59.445498550159755]
We present a novel paradigm to enhance the ability of object detector, e.g., expanding categories or improving detection performance.
We integrate an instance-level grounding head into a pre-trained, generative diffusion model, to augment it with the ability of localising instances in the generated images.
We conduct thorough experiments to show that, this enhanced version of diffusion model, termed as InstaGen, can serve as a data synthesizer.
arXiv Detail & Related papers (2024-02-08T18:59:53Z) - Enhancing Novel Object Detection via Cooperative Foundational Models [75.30243629533277]
We present a novel approach to transform existing closed-set detectors into open-set detectors.
We surpass the current state-of-the-art by a margin of 7.2 $ textAP_50 $ for novel classes.
arXiv Detail & Related papers (2023-11-19T17:28:28Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - No Token Left Behind: Explainability-Aided Image Classification and
Generation [79.4957965474334]
We present a novel explainability-based approach, which adds a loss term to ensure that CLIP focuses on all relevant semantic parts of the input.
Our method yields an improvement in the recognition rate, without additional training or fine-tuning.
arXiv Detail & Related papers (2022-04-11T07:16:39Z) - Self-Supervised Object Detection via Generative Image Synthesis [106.65384648377349]
We present the first end-to-end analysis-by synthesis framework with controllable GANs for the task of self-supervised object detection.
We use collections of real world images without bounding box annotations to learn to synthesize and detect objects.
Our work advances the field of self-supervised object detection by introducing a successful new paradigm of using controllable GAN-based image synthesis for it.
arXiv Detail & Related papers (2021-10-19T11:04:05Z) - Detective: An Attentive Recurrent Model for Sparse Object Detection [25.5804429439316]
Detective is an attentive object detector that identifies objects in images in a sequential manner.
Detective is a sparse object detector that generates a single bounding box per object instance.
We propose a training mechanism based on the Hungarian algorithm and a loss that balances the localization and classification tasks.
arXiv Detail & Related papers (2020-04-25T17:41:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.