Human-Guided Image Generation for Expanding Small-Scale Training Image Datasets
- URL: http://arxiv.org/abs/2412.16839v2
- Date: Tue, 24 Dec 2024 01:53:54 GMT
- Title: Human-Guided Image Generation for Expanding Small-Scale Training Image Datasets
- Authors: Changjian Chen, Fei Lv, Yalong Guan, Pengcheng Wang, Shengjie Yu, Yifan Zhang, Zhuo Tang,
- Abstract summary: The performance of computer vision models in certain real-world applications is limited by the small number of available images.
We propose a human-guided image generation method for more controllable dataset expansion.
- Score: 10.93687452351281
- License:
- Abstract: The performance of computer vision models in certain real-world applications (e.g., rare wildlife observation) is limited by the small number of available images. Expanding datasets using pre-trained generative models is an effective way to address this limitation. However, since the automatic generation process is uncontrollable, the generated images are usually limited in diversity, and some of them are undesired. In this paper, we propose a human-guided image generation method for more controllable dataset expansion. We develop a multi-modal projection method with theoretical guarantees to facilitate the exploration of both the original and generated images. Based on the exploration, users refine the prompts and re-generate images for better performance. Since directly refining the prompts is challenging for novice users, we develop a sample-level prompt refinement method to make it easier. With this method, users only need to provide sample-level feedback (e.g., which samples are undesired) to obtain better prompts. The effectiveness of our method is demonstrated through the quantitative evaluation of the multi-modal projection method, improved model performance in the case study for both classification and object detection tasks, and positive feedback from the experts.
Related papers
- A Large-scale AI-generated Image Inpainting Benchmark [11.216906046169683]
We propose a methodology for creating high-quality inpainting datasets and apply it to create DiQuID.
DiQuID comprises over 95,000 inpainted images generated from 78,000 original images sourced from MS-COCO, RAISE, and OpenImages.
We provide comprehensive benchmarking results using state-of-the-art forgery detection methods, demonstrating the dataset's effectiveness in evaluating and improving detection algorithms.
arXiv Detail & Related papers (2025-02-10T15:56:28Z) - Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models [51.067146460271466]
Evaluation of visual generative models can be time-consuming and computationally expensive.
We propose the Evaluation Agent framework, which employs human-like strategies for efficient, dynamic, multi-round evaluations.
It offers four key advantages: 1) efficiency, 2) promptable evaluation tailored to diverse user needs, 3) explainability beyond single numerical scores, and 4) scalability across various models and tools.
arXiv Detail & Related papers (2024-12-10T18:52:39Z) - Active Generation for Image Classification [45.93535669217115]
We propose to address the efficiency of image generation by focusing on the specific needs and characteristics of the model.
With a central tenet of active learning, our method, named ActGen, takes a training-aware approach to image generation.
arXiv Detail & Related papers (2024-03-11T08:45:31Z) - Detecting Generated Images by Real Images Only [64.12501227493765]
Existing generated image detection methods detect visual artifacts in generated images or learn discriminative features from both real and generated images by massive training.
This paper approaches the generated image detection problem from a new perspective: Start from real images.
By finding the commonality of real images and mapping them to a dense subspace in feature space, the goal is that generated images, regardless of their generative model, are then projected outside the subspace.
arXiv Detail & Related papers (2023-11-02T03:09:37Z) - Learning from Multi-Perception Features for Real-Word Image
Super-resolution [87.71135803794519]
We propose a novel SR method called MPF-Net that leverages multiple perceptual features of input images.
Our method incorporates a Multi-Perception Feature Extraction (MPFE) module to extract diverse perceptual information.
We also introduce a contrastive regularization term (CR) that improves the model's learning capability.
arXiv Detail & Related papers (2023-05-26T07:35:49Z) - Deep Image Fingerprint: Towards Low Budget Synthetic Image Detection and Model Lineage Analysis [8.777277201807351]
We develop a new detection method for images that are indistinguishable from real ones.
Our method can detect images from a known generative model and enable us to establish relationships between fine-tuned generative models.
Our approach achieves comparable performance to state-of-the-art pre-trained detection methods on images generated by Stable Diffusion and Midversa.
arXiv Detail & Related papers (2023-03-19T20:31:38Z) - Improving Image Clustering through Sample Ranking and Its Application to
remote--sensing images [14.531733039462058]
We propose a novel method by first ranking samples within each cluster based on the confidence in their belonging to the current cluster.
For ranking the samples, we developed a method for computing the likelihood of samples belonging to the current clusters based on whether they are situated in densely populated neighborhoods.
We show that our method can be effectively applied to remote-sensing images.
arXiv Detail & Related papers (2022-09-26T12:10:02Z) - Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image.
We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively.
Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z) - Improved Techniques for Training Single-Image GANs [44.251222212306764]
generative models can be learned from a single image, as opposed to from a large dataset.
We propose some best practices to train a model capable of generating realistic images from only a single sample.
Our model is up to six times faster to train, has fewer parameters, and can better capture the global structure of images.
arXiv Detail & Related papers (2020-03-25T17:33:25Z) - Informative Sample Mining Network for Multi-Domain Image-to-Image
Translation [101.01649070998532]
We show that improving the sample selection strategy is an effective solution for image-to-image translation tasks.
We propose a novel multi-stage sample training scheme to reduce sample hardness while preserving sample informativeness.
arXiv Detail & Related papers (2020-01-05T05:48:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.