Multi-Perspective Data Augmentation for Few-shot Object Detection
- URL: http://arxiv.org/abs/2502.18195v1
- Date: Tue, 25 Feb 2025 13:34:52 GMT
- Title: Multi-Perspective Data Augmentation for Few-shot Object Detection
- Authors: Anh-Khoa Nguyen Vu, Quoc-Truong Truong, Vinh-Tiep Nguyen, Thanh Duc Ngo, Thanh-Toan Do, Tam V. Nguyen,
- Abstract summary: We propose a Multi-Perspective Data Augmentation (MPAD) framework.<n>In terms of foreground-foreground relationships, we propose in-context learning for object synthesis (ICOS) with bounding box adjustments.<n>For foreground-background relationships, we introduce a Background Proposal method (BAP) to sample typical and hard backgrounds.
- Score: 17.34318821332361
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent few-shot object detection (FSOD) methods have focused on augmenting synthetic samples for novel classes, show promising results to the rise of diffusion models. However, the diversity of such datasets is often limited in representativeness because they lack awareness of typical and hard samples, especially in the context of foreground and background relationships. To tackle this issue, we propose a Multi-Perspective Data Augmentation (MPAD) framework. In terms of foreground-foreground relationships, we propose in-context learning for object synthesis (ICOS) with bounding box adjustments to enhance the detail and spatial information of synthetic samples. Inspired by the large margin principle, support samples play a vital role in defining class boundaries. Therefore, we design a Harmonic Prompt Aggregation Scheduler (HPAS) to mix prompt embeddings at each time step of the generation process in diffusion models, producing hard novel samples. For foreground-background relationships, we introduce a Background Proposal method (BAP) to sample typical and hard backgrounds. Extensive experiments on multiple FSOD benchmarks demonstrate the effectiveness of our approach. Our framework significantly outperforms traditional methods, achieving an average increase of $17.5\%$ in nAP50 over the baseline on PASCAL VOC. Code is available at https://github.com/nvakhoa/MPAD.
Related papers
- Effortless Active Labeling for Long-Term Test-Time Adaptation [18.02130603595324]
Long-term test-time adaptation is a challenging task due to error accumulation.
Recent approaches tackle this issue by actively labeling a small proportion of samples in each batch.
In this paper, we investigate how to achieve effortless active labeling so that a maximum of one sample is selected for annotation in each batch.
arXiv Detail & Related papers (2025-03-18T07:49:27Z) - Tackling Few-Shot Segmentation in Remote Sensing via Inpainting Diffusion Model [0.3749861135832073]
In the few-shot segmentation task, models are typically trained on base classes with abundant annotations and later adapted to novel classes with limited examples.
We propose a simple approach that leverages diffusion models to generate diverse variations of novel-class objects.
By framing the problem as an image inpainting task, we synthesize plausible instances of novel classes under various environments.
arXiv Detail & Related papers (2025-03-05T02:08:51Z) - Diverse Rare Sample Generation with Pretrained GANs [24.227852798611025]
This study proposes a novel approach for generating diverse rare samples from high-resolution image datasets with pretrained GANs.<n>Our method employs gradient-based optimization of latent vectors within a multi-objective framework and utilizes normalizing flows for density estimation on the feature space.<n>This enables the generation of diverse rare images, with controllable parameters for rarity, diversity, and similarity to a reference image.
arXiv Detail & Related papers (2024-12-27T09:10:30Z) - Learning from Different Samples: A Source-free Framework for Semi-supervised Domain Adaptation [20.172605920901777]
This paper focuses on designing a framework to use different strategies for comprehensively mining different target samples.
We propose a novel source-free framework (SOUF) to achieve semi-supervised fine-tuning of the source pre-trained model on the target domain.
arXiv Detail & Related papers (2024-11-11T02:09:32Z) - Improved off-policy training of diffusion samplers [93.66433483772055]
We study the problem of training diffusion models to sample from a distribution with an unnormalized density or energy function.<n>We benchmark several diffusion-structured inference methods, including simulation-based variational approaches and off-policy methods.<n>Our results shed light on the relative advantages of existing algorithms while bringing into question some claims from past work.
arXiv Detail & Related papers (2024-02-07T18:51:49Z) - Dual-View Data Hallucination with Semantic Relation Guidance for Few-Shot Image Recognition [49.26065739704278]
We propose a framework that exploits semantic relations to guide dual-view data hallucination for few-shot image recognition.
An instance-view data hallucination module hallucinates each sample of a novel class to generate new data.
A prototype-view data hallucination module exploits semantic-aware measure to estimate the prototype of a novel class.
arXiv Detail & Related papers (2024-01-13T12:32:29Z) - Deep Boosting Multi-Modal Ensemble Face Recognition with Sample-Level
Weighting [11.39204323420108]
Deep convolutional neural networks have achieved remarkable success in face recognition.
The current training benchmarks exhibit an imbalanced quality distribution.
This poses issues for generalization on hard samples since they are underrepresented during training.
Inspired by the well-known AdaBoost, we propose a sample-level weighting approach to incorporate the importance of different samples into the FR loss.
arXiv Detail & Related papers (2023-08-18T01:44:54Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z) - Contrastive Prototype Learning with Augmented Embeddings for Few-Shot
Learning [58.2091760793799]
We propose a novel contrastive prototype learning with augmented embeddings (CPLAE) model.
With a class prototype as an anchor, CPL aims to pull the query samples of the same class closer and those of different classes further away.
Extensive experiments on several benchmarks demonstrate that our proposed CPLAE achieves new state-of-the-art.
arXiv Detail & Related papers (2021-01-23T13:22:44Z) - Multi-Scale Positive Sample Refinement for Few-Shot Object Detection [61.60255654558682]
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances.
We propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD.
MPSR generates multi-scale positive samples as object pyramids and refines the prediction at various scales.
arXiv Detail & Related papers (2020-07-18T09:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.