Learning Gaussian Data Augmentation in Feature Space for One-shot Object Detection in Manga
- URL: http://arxiv.org/abs/2410.05935v1
- Date: Tue, 8 Oct 2024 11:38:13 GMT
- Title: Learning Gaussian Data Augmentation in Feature Space for One-shot Object Detection in Manga
- Authors: Takara Taniguchi, Ryosuke Furuta,
- Abstract summary: Rising global popularity of Japanese manga has made the object detection of character faces increasingly important.
New characters appear every time a new volume of manga is released, making it impractical to re-train object detectors each time.
One-shot object detection, where only a single query (reference) image is required to detect a new character, is an essential task in the manga industry.
- Score: 2.800768893804362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We tackle one-shot object detection in Japanese Manga. The rising global popularity of Japanese manga has made the object detection of character faces increasingly important, with potential applications such as automatic colorization. However, obtaining sufficient data for training conventional object detectors is challenging due to copyright restrictions. Additionally, new characters appear every time a new volume of manga is released, making it impractical to re-train object detectors each time to detect these new characters. Therefore, one-shot object detection, where only a single query (reference) image is required to detect a new character, is an essential task in the manga industry. One challenge with one-shot object detection in manga is the large variation in the poses and facial expressions of characters in target images, despite having only one query image as a reference. Another challenge is that the frequency of character appearances follows a long-tail distribution. To overcome these challenges, we propose a data augmentation method in feature space to increase the variation of the query. The proposed method augments the feature from the query by adding Gaussian noise, with the noise variance at each channel learned during training. The experimental results show that the proposed method improves the performance for both seen and unseen classes, surpassing data augmentation methods in image space.
Related papers
- Extracting Human Attention through Crowdsourced Patch Labeling [18.947126675569667]
In image classification, a significant problem arises from bias in the datasets.
One approach to mitigate such biases is to direct the model's attention toward the target object's location.
We propose a novel patch-labeling method that integrates AI assistance with crowdsourcing to capture human attention from images.
arXiv Detail & Related papers (2024-03-22T07:57:27Z) - Learning from Rich Semantics and Coarse Locations for Long-tailed Object
Detection [157.18560601328534]
RichSem is a robust method to learn rich semantics from coarse locations without the need of accurate bounding boxes.
We add a semantic branch to our detector to learn these soft semantics and enhance feature representations for long-tailed object detection.
Our method achieves state-of-the-art performance without requiring complex training and testing procedures.
arXiv Detail & Related papers (2023-10-18T17:59:41Z) - Multi-Scale Memory Comparison for Zero-/Few-Shot Anomaly Detection [35.76765622970398]
Anomaly detection has gained considerable attention due to its broad range of applications, particularly in industrial defect detection.
To address the challenges of data collection, researchers have introduced zero-/few-shot anomaly detection techniques.
We propose a straightforward yet powerful multi-scale memory comparison framework for zero-/few-shot anomaly detection.
arXiv Detail & Related papers (2023-08-09T08:28:25Z) - Unsupervised Manga Character Re-identification via Face-body and
Spatial-temporal Associated Clustering [21.696847342192072]
The artistic expression and stylistic limitations of manga pose many challenges to the re-identification problem.
Inspired by the idea that some content-related features may help clustering, we propose a Face-body and Spatial-temporal Associated Clustering method.
In the face-body combination module, a face-body graph is constructed to solve problems such as exaggeration and deformation in artistic creation.
In the spatial-temporal relationship correction module, we analyze the appearance features of characters and design a temporal-spatial-related triplet loss to fine-tune the clustering.
arXiv Detail & Related papers (2022-04-10T07:28:41Z) - Learning to Detect Every Thing in an Open World [139.78830329914135]
We propose a simple yet surprisingly powerful data augmentation and training scheme we call Learning to Detect Every Thing (LDET)
To avoid suppressing hidden objects, background objects that are visible but unlabeled, we paste annotated objects on a background image sampled from a small region of the original image.
LDET leads to significant improvements on many datasets in the open world instance segmentation task.
arXiv Detail & Related papers (2021-12-03T03:56:06Z) - ZSD-YOLO: Zero-Shot YOLO Detection using Vision-Language
KnowledgeDistillation [5.424015823818208]
A dataset such as COCO is extensively annotated across many images but with a sparse number of categories and annotating all object classes across a diverse domain is expensive and challenging.
We develop a Vision-Language distillation method that aligns both image and text embeddings from a zero-shot pre-trained model such as CLIP to a modified semantic prediction head from a one-stage detector like YOLOv5.
During inference, our model can be adapted to detect any number of object classes without additional training.
arXiv Detail & Related papers (2021-09-24T16:46:36Z) - Few-shot Weakly-Supervised Object Detection via Directional Statistics [55.97230224399744]
We propose a probabilistic multiple instance learning approach for few-shot Common Object Localization (COL) and few-shot Weakly Supervised Object Detection (WSOD)
Our model simultaneously learns the distribution of the novel objects and localizes them via expectation-maximization steps.
Our experiments show that the proposed method, despite being simple, outperforms strong baselines in few-shot COL and WSOD, as well as large-scale WSOD tasks.
arXiv Detail & Related papers (2021-03-25T22:34:16Z) - Data Augmentation for Object Detection via Differentiable Neural
Rendering [71.00447761415388]
It is challenging to train a robust object detector when annotated data is scarce.
Existing approaches to tackle this problem include semi-supervised learning that interpolates labeled data from unlabeled data.
We introduce an offline data augmentation method for object detection, which semantically interpolates the training data with novel views.
arXiv Detail & Related papers (2021-03-04T06:31:06Z) - A Simple and Effective Use of Object-Centric Images for Long-Tailed
Object Detection [56.82077636126353]
We take advantage of object-centric images to improve object detection in scene-centric images.
We present a simple yet surprisingly effective framework to do so.
Our approach can improve the object detection (and instance segmentation) accuracy of rare objects by 50% (and 33%) relatively.
arXiv Detail & Related papers (2021-02-17T17:27:21Z) - Any-Shot Object Detection [81.88153407655334]
'Any-shot detection' is where totally unseen and few-shot categories can simultaneously co-occur during inference.
We propose a unified any-shot detection model, that can concurrently learn to detect both zero-shot and few-shot object classes.
Our framework can also be used solely for Zero-shot detection and Few-shot detection tasks.
arXiv Detail & Related papers (2020-03-16T03:43:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.