Enhancing Small-Scale Dataset Expansion with Triplet-Connection-based Sample Re-Weighting
- URL: http://arxiv.org/abs/2508.07723v1
- Date: Mon, 11 Aug 2025 07:50:47 GMT
- Title: Enhancing Small-Scale Dataset Expansion with Triplet-Connection-based Sample Re-Weighting
- Authors: Ting Xiang, Changjian Chen, Zhuo Tang, Qifeng Zhang, Fei Lyu, Li Yang, Jiapeng Zhang, Kenli Li,
- Abstract summary: Due to the uncontrollable generation process and the ambiguity of natural language, noisy images may be generated.<n>Re-weighting is an effective way to address this issue by assigning low weights to such noisy images.<n>We develop TriReWeight, a triplet-connection-based sample re-weighting method to enhance generative data augmentation.
- Score: 33.69942307190522
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The performance of computer vision models in certain real-world applications, such as medical diagnosis, is often limited by the scarcity of available images. Expanding datasets using pre-trained generative models is an effective solution. However, due to the uncontrollable generation process and the ambiguity of natural language, noisy images may be generated. Re-weighting is an effective way to address this issue by assigning low weights to such noisy images. We first theoretically analyze three types of supervision for the generated images. Based on the theoretical analysis, we develop TriReWeight, a triplet-connection-based sample re-weighting method to enhance generative data augmentation. Theoretically, TriReWeight can be integrated with any generative data augmentation methods and never downgrade their performance. Moreover, its generalization approaches the optimal in the order $O(\sqrt{d\ln (n)/n})$. Our experiments validate the correctness of the theoretical analysis and demonstrate that our method outperforms the existing SOTA methods by $7.9\%$ on average over six natural image datasets and by $3.4\%$ on average over three medical datasets. We also experimentally validate that our method can enhance the performance of different generative data augmentation methods.
Related papers
- Beyond Data Scarcity Optimizing R3GAN for Medical Image Generation from Small Datasets [0.044780965967547055]
This work investigates how generative adversarial networks (GANs) can be optimized for small datasets to generate realistic and diagnostically meaningful images.<n>Based on systematic experiments with R3GAN, we established effective training strategies and designed an optimized configuration for 256x256-resolution datasets.<n>The generated samples were used to balance an imbalanced embryo dataset, leading to substantial improvement in classification performance.
arXiv Detail & Related papers (2025-10-29T13:03:36Z) - Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step [86.69947123512836]
Chain-of-Thought (CoT) reasoning has been extensively explored in large models to tackle complex understanding tasks.<n>We provide the first comprehensive investigation of the potential of CoT reasoning to enhance autoregressive image generation.<n>We propose the Potential Assessment Reward Model (PARM) and PARM++, specialized for autoregressive image generation.
arXiv Detail & Related papers (2025-01-23T18:59:43Z) - Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation.<n>We evaluate our method on three public longitudinal benchmark datasets of brain MRI and chest X-rays for counterfactual image generation.
arXiv Detail & Related papers (2024-12-30T01:59:34Z) - A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We present a novel framework for training 3D image-conditioned diffusion models using only 2D supervision.<n>Most existing 3D generative models rely on full 3D supervision, which is impractical due to the scarcity of large-scale 3D datasets.
arXiv Detail & Related papers (2024-12-01T00:29:57Z) - Local Lesion Generation is Effective for Capsule Endoscopy Image Data Augmentation in a Limited Data Setting [0.0]
We propose and evaluate two local lesion generation approaches to address the challenge of augmenting small medical image datasets.<n>The first approach employs the Poisson Image Editing algorithm, a classical image processing technique, to create realistic image composites.<n>The second approach introduces a novel generative method, leveraging a fine-tuned Image Inpainting GAN to synthesize realistic lesions.
arXiv Detail & Related papers (2024-11-05T13:44:25Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease
detection [51.697248252191265]
This work summarizes and strictly observes best practices regarding data handling, experimental design, and model evaluation.
We focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare.
Within this framework, we train predictive 15 models, considering three different data augmentation strategies and five distinct 3D CNN architectures.
arXiv Detail & Related papers (2023-09-13T10:40:41Z) - Data-iterative Optimization Score Model for Stable Ultra-Sparse-View CT
Reconstruction [2.2336243882030025]
We propose an iterative optimization data scoring model (DOSM) for sparse-view CT reconstruction.
DOSM integrates data consistency into its data consistency element, effectively balancing measurement data and generative model constraints.
We leverage conventional techniques to optimize DOSM updates.
arXiv Detail & Related papers (2023-08-28T09:23:18Z) - Reference-Free Isotropic 3D EM Reconstruction using Diffusion Models [8.590026259176806]
We propose a diffusion-model-based framework that overcomes the limitations of requiring reference data or prior knowledge about the degradation process.
Our approach utilizes 2D diffusion models to consistently reconstruct 3D volumes and is well-suited for highly downsampled data.
arXiv Detail & Related papers (2023-08-03T07:57:02Z) - Training on Thin Air: Improve Image Classification with Generated Data [28.96941414724037]
Diffusion Inversion is a simple yet effective method to generate diverse, high-quality training data for image classification.
Our approach captures the original data distribution and ensures data coverage by inverting images to the latent space of Stable Diffusion.
We identify three key components that allow our generated images to successfully supplant the original dataset.
arXiv Detail & Related papers (2023-05-24T16:33:02Z) - IRGen: Generative Modeling for Image Retrieval [82.62022344988993]
In this paper, we present a novel methodology, reframing image retrieval as a variant of generative modeling.
We develop our model, dubbed IRGen, to address the technical challenge of converting an image into a concise sequence of semantic units.
Our model achieves state-of-the-art performance on three widely-used image retrieval benchmarks and two million-scale datasets.
arXiv Detail & Related papers (2023-03-17T17:07:36Z) - Harmonizing Pathological and Normal Pixels for Pseudo-healthy Synthesis [68.5287824124996]
We present a new type of discriminator, the segmentor, to accurately locate the lesions and improve the visual quality of pseudo-healthy images.
We apply the generated images into medical image enhancement and utilize the enhanced results to cope with the low contrast problem.
Comprehensive experiments on the T2 modality of BraTS demonstrate that the proposed method substantially outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2022-03-29T08:41:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.