Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity
- URL: http://arxiv.org/abs/2602.18525v1
- Date: Fri, 20 Feb 2026 03:02:36 GMT
- Title: Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity
- Authors: Vasile Marian, Yong-Bin Kang, Alexander Buddery,
- Abstract summary: We present a controlled evaluation of synthetic augmentation for YOLOv11 across three single-class detection regimes.<n>We benchmark six GAN-, diffusion-, and hybrid-based generators over augmentation ratios from 10% to 150% of the real training split.<n>For each dataset-generator-augmentation configuration, we compute pre-training dataset metrics under a matched-size bootstrap protocol.
- Score: 43.338311770275745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Synthetic images are increasingly used to augment object-detection training sets, but reliably evaluating a synthetic dataset before training remains difficult: standard global generative metrics (e.g., FID) often do not predict downstream detection mAP. We present a controlled evaluation of synthetic augmentation for YOLOv11 across three single-class detection regimes -- Traffic Signs (sparse/near-saturated), Cityscapes Pedestrian (dense/occlusion-heavy), and COCO PottedPlant (multi-instance/high-variability). We benchmark six GAN-, diffusion-, and hybrid-based generators over augmentation ratios from 10% to 150% of the real training split, and train YOLOv11 both from scratch and with COCO-pretrained initialization, evaluating on held-out real test splits (mAP@0.50:0.95). For each dataset-generator-augmentation configuration, we compute pre-training dataset metrics under a matched-size bootstrap protocol, including (i) global feature-space metrics in both Inception-v3 and DINOv2 embeddings and (ii) object-centric distribution distances over bounding-box statistics. Synthetic augmentation yields substantial gains in the more challenging regimes (up to +7.6% and +30.6% relative mAP in Pedestrian and PottedPlant, respectively) but is marginal in Traffic Signs and under pretrained fine-tuning. To separate metric signal from augmentation quantity, we report both raw and augmentation-controlled (residualized) correlations with multiple-testing correction, showing that metric-performance alignment is strongly regime-dependent and that many apparent raw associations weaken after controlling for augmentation level.
Related papers
- Enhancing Diversity and Feasibility: Joint Population Synthesis from Multi-source Data Using Generative Models [4.73459038844245]
This study proposes a novel method to simultaneously integrate and synthesize multi-source datasets using a Wasserstein Generative Adversarial Network (WGAN) with gradient penalty.<n>Results show that the proposed joint approach outperforms the sequential baseline, with recall increasing by 7% and precision by 15%.<n>Since synthetic populations serve as a key input for agent-based models (ABM), this multi-source generative approach has the potential to significantly enhance the accuracy and reliability of ABM.
arXiv Detail & Related papers (2026-02-17T00:02:30Z) - Silent Inconsistency in Data-Parallel Full Fine-Tuning: Diagnosing Worker-Level Optimization Misalignment [27.352639822596146]
Cross-worker divergence in losses and gradients can remain invisible under conventional monitoring signals.<n>We propose a model-agnostic diagnostic framework that quantifies worker-level consistency using training signals readily available in standard pipelines.
arXiv Detail & Related papers (2026-02-16T04:42:30Z) - Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning [71.30276778807068]
We propose a unified framework that strategically coordinates sample pruning and token pruning.<n>Q-Tuning achieves a +38% average improvement over the full-data SFT baseline using only 12.5% of the original training data.
arXiv Detail & Related papers (2025-09-28T13:27:38Z) - MCLPD:Multi-view Contrastive Learning for EEG-based PD Detection Across Datasets [18.392841877276354]
This paper proposes a semi-supervised learning framework named MCLPD.<n>It integrates multi-view contrastive pre-training with lightweight supervised fine-tuning to enhance cross-dataset PD detection performance.
arXiv Detail & Related papers (2025-08-12T08:19:27Z) - Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning [50.809769498312434]
We propose a novel dataset pruning method termed as Temporal Dual-Depth Scoring (TDDS)
Our method achieves 54.51% accuracy with only 10% training data, surpassing random selection by 7.83% and other comparison methods by at least 12.69%.
arXiv Detail & Related papers (2023-11-22T03:45:30Z) - MELEP: A Novel Predictive Measure of Transferability in Multi-Label ECG Diagnosis [1.3654846342364306]
We introduce MELEP, a measure designed to estimate the effectiveness of knowledge transfer from a pre-trained model to a downstream ECG diagnosis task.
Our experiments show that MELEP can predict the performance of pre-trained convolutional and recurrent deep neural networks, on small and imbalanced ECG data.
arXiv Detail & Related papers (2023-10-27T14:57:10Z) - Joint Metrics Matter: A Better Standard for Trajectory Forecasting [67.1375677218281]
Multi-modal trajectory forecasting methods evaluate using single-agent metrics (marginal metrics)
Only focusing on marginal metrics can lead to unnatural predictions, such as colliding trajectories or diverging trajectories for people who are clearly walking together as a group.
We present the first comprehensive evaluation of state-of-the-art trajectory forecasting methods with respect to multi-agent metrics (joint metrics): JADE, JFDE, and collision rate.
arXiv Detail & Related papers (2023-05-10T16:27:55Z) - Implicit Counterfactual Data Augmentation for Robust Learning [24.795542869249154]
This study proposes an Implicit Counterfactual Data Augmentation method to remove spurious correlations and make stable predictions.<n>Experiments have been conducted across various biased learning scenarios covering both image and text datasets.
arXiv Detail & Related papers (2023-04-26T10:36:40Z) - A Meta-Learning Approach to Predicting Performance and Data Requirements [163.4412093478316]
We propose an approach to estimate the number of samples required for a model to reach a target performance.
We find that the power law, the de facto principle to estimate model performance, leads to large error when using a small dataset.
We introduce a novel piecewise power law (PPL) that handles the two data differently.
arXiv Detail & Related papers (2023-03-02T21:48:22Z) - Confidence-Guided Data Augmentation for Deep Semi-Supervised Training [0.9968241071319184]
We propose a new data augmentation technique for semi-supervised learning settings that emphasizes learning from the most challenging regions of the feature space.
We perform experiments on two benchmark RGB datasets: CIFAR-100 and STL-10, and show that the proposed scheme improves classification performance in terms of accuracy and robustness.
arXiv Detail & Related papers (2022-09-16T21:23:19Z) - Semi-supervised Contrastive Learning with Similarity Co-calibration [72.38187308270135]
We propose a novel training strategy, termed as Semi-supervised Contrastive Learning (SsCL)
SsCL combines the well-known contrastive loss in self-supervised learning with the cross entropy loss in semi-supervised learning.
We show that SsCL produces more discriminative representation and is beneficial to few shot learning.
arXiv Detail & Related papers (2021-05-16T09:13:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.