You Only Submit One Image to Find the Most Suitable Generative Model
- URL: http://arxiv.org/abs/2412.12232v1
- Date: Mon, 16 Dec 2024 14:46:57 GMT
- Title: You Only Submit One Image to Find the Most Suitable Generative Model
- Authors: Zhi Zhou, Lan-Zhe Guo, Peng-Xiao Song, Yu-Feng Li,
- Abstract summary: We propose a novel setting called Generative Model Identification (GMI)
GMI aims to enable the user to identify the most appropriate generative model(s) for the user's requirements efficiently.
- Score: 48.67303250592189
- License:
- Abstract: Deep generative models have achieved promising results in image generation, and various generative model hubs, e.g., Hugging Face and Civitai, have been developed that enable model developers to upload models and users to download models. However, these model hubs lack advanced model management and identification mechanisms, resulting in users only searching for models through text matching, download sorting, etc., making it difficult to efficiently find the model that best meets user requirements. In this paper, we propose a novel setting called Generative Model Identification (GMI), which aims to enable the user to identify the most appropriate generative model(s) for the user's requirements from a large number of candidate models efficiently. To our best knowledge, it has not been studied yet. In this paper, we introduce a comprehensive solution consisting of three pivotal modules: a weighted Reduced Kernel Mean Embedding (RKME) framework for capturing the generated image distribution and the relationship between images and prompts, a pre-trained vision-language model aimed at addressing dimensionality challenges, and an image interrogator designed to tackle cross-modality issues. Extensive empirical results demonstrate the proposal is both efficient and effective. For example, users only need to submit a single example image to describe their requirements, and the model platform can achieve an average top-4 identification accuracy of more than 80%.
Related papers
- CGI: Identifying Conditional Generative Models with Example Images [14.453885742032481]
Generative models have achieved remarkable performance recently, and thus model hubs have emerged.
It is not easy for users to review model descriptions and example images, choosing which model best meets their needs.
We propose Generative Model Identification (CGI), which aims to identify the most suitable model using user-provided example images.
arXiv Detail & Related papers (2025-01-23T09:31:06Z) - EditAR: Unified Conditional Generation with Autoregressive Models [58.093860528672735]
We propose EditAR, a single unified autoregressive framework for a variety of conditional image generation tasks.
The model takes both images and instructions as inputs, and predicts the edited images tokens in a vanilla next-token paradigm.
We evaluate its effectiveness across diverse tasks on established benchmarks, showing competitive performance to various state-of-the-art task-specific methods.
arXiv Detail & Related papers (2025-01-08T18:59:35Z) - Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation [52.509092010267665]
We introduce LlamaGen, a new family of image generation models that apply original next-token prediction'' paradigm of large language models to visual generation domain.
It is an affirmative answer to whether vanilla autoregressive models, e.g., Llama, without inductive biases on visual signals can achieve state-of-the-art image generation performance if scaling properly.
arXiv Detail & Related papers (2024-06-10T17:59:52Z) - Which Model Generated This Image? A Model-Agnostic Approach for Origin Attribution [23.974575820244944]
In this work, we study the origin attribution of generated images in a practical setting.
The goal is to check if a given image is generated by the source model.
We propose OCC-CLIP, a CLIP-based framework for few-shot one-class classification.
arXiv Detail & Related papers (2024-04-03T12:54:16Z) - Fast Adaptation with Bradley-Terry Preference Models in Text-To-Image
Classification and Generation [0.0]
We leverage the Bradley-Terry preference model to develop a fast adaptation method that efficiently fine-tunes the original model.
Extensive evidence of the capabilities of this framework is provided through experiments in different domains related to multimodal text and image understanding.
arXiv Detail & Related papers (2023-07-15T07:53:12Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z) - Composing Ensembles of Pre-trained Models via Iterative Consensus [95.10641301155232]
We propose a unified framework for composing ensembles of different pre-trained models.
We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization.
We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer.
arXiv Detail & Related papers (2022-10-20T18:46:31Z) - Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image.
We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively.
Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z) - Comprehensive and Efficient Data Labeling via Adaptive Model Scheduling [25.525371500391568]
In certain applications, such as image retrieval platforms and photo album management apps, it is often required to execute a collection of models to obtain sufficient labels.
We propose an Adaptive Model Scheduling framework, consisting of 1) a deep reinforcement learning-based approach to predict the value of untrivial models by mining semantic relationship among diverse models, and 2) two algorithms to adaptively schedule the model execution order under a deadline or deadline-memory constraints respectively.
Our design could save around 53% execution time without loss of any valuable labels.
arXiv Detail & Related papers (2020-02-08T03:54:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.