EasyPhoto: Your Smart AI Photo Generator
- URL: http://arxiv.org/abs/2310.04672v1
- Date: Sat, 7 Oct 2023 03:16:56 GMT
- Title: EasyPhoto: Your Smart AI Photo Generator
- Authors: Ziheng Wu, Jiaqi Xu, Xinyi Zou, Kunzhe Huang, Xing Shi, Jun Huang
- Abstract summary: We propose a novel WebUI plugin called EasyPhoto, which enables the generation of AI portraits.
By training a digital doppelganger of a specific user ID using 5 to 20 relevant images, the finetuned model allows for the generation of AI photos using arbitrary templates.
- Score: 11.926387357705712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stable Diffusion web UI (SD-WebUI) is a comprehensive project that provides a
browser interface based on Gradio library for Stable Diffusion models. In this
paper, We propose a novel WebUI plugin called EasyPhoto, which enables the
generation of AI portraits. By training a digital doppelganger of a specific
user ID using 5 to 20 relevant images, the finetuned model (according to the
trained LoRA model) allows for the generation of AI photos using arbitrary
templates. Our current implementation supports the modification of multiple
persons and different photo styles. Furthermore, we allow users to generate
fantastic template image with the strong SDXL model, enhancing EasyPhoto's
capabilities to deliver more diverse and satisfactory results. The source code
for EasyPhoto is available at: https://github.com/aigc-apps/sd-webui-EasyPhoto.
We also support a webui-free version by using diffusers:
https://github.com/aigc-apps/EasyPhoto. We are continuously enhancing our
efforts to expand the EasyPhoto pipeline, making it suitable for any
identification (not limited to just the face), and we enthusiastically welcome
any intriguing ideas or suggestions.
Related papers
- Step1X-Edit: A Practical Framework for General Image Editing [64.07202539610576]
We release a state-of-the-art image editing model, called Step1X-Edit.
It can provide comparable performance against the closed-source models like GPT-4o and Gemini2 Flash.
For evaluation, we develop the GEdit-Bench, a novel benchmark rooted in real-world user instructions.
arXiv Detail & Related papers (2025-04-24T17:25:12Z) - EditAR: Unified Conditional Generation with Autoregressive Models [58.093860528672735]
We propose EditAR, a single unified autoregressive framework for a variety of conditional image generation tasks.
The model takes both images and instructions as inputs, and predicts the edited images tokens in a vanilla next-token paradigm.
We evaluate its effectiveness across diverse tasks on established benchmarks, showing competitive performance to various state-of-the-art task-specific methods.
arXiv Detail & Related papers (2025-01-08T18:59:35Z) - Prompt Sliders for Fine-Grained Control, Editing and Erasing of Concepts in Diffusion Models [53.385754347812835]
Concept Sliders introduced a method for fine-grained image control and editing by learning concepts (attributes/objects)
This approach adds parameters and increases inference time due to the loading and unloading of Low-Rank Adapters (LoRAs) used for learning concepts.
We propose a straightforward textual inversion method to learn concepts through text embeddings, which are generalizable across models that share the same text encoder.
arXiv Detail & Related papers (2024-09-25T01:02:30Z) - On AI-Inspired UI-Design [5.969881132928718]
We discuss three major complementary approaches on how to use Artificial Intelligence (AI) to support app designers create better, more diverse, and creative UI of mobile apps.
First, designers can prompt a Large Language Model (LLM) like GPT to directly generate and adjust one or multiple UIs.
Second, a Vision-Language Model (VLM) enables designers to effectively search a large screenshot dataset, e.g. from apps published in app stores.
Third, a Diffusion Model (DM) specifically designed to generate app UIs as inspirational images.
arXiv Detail & Related papers (2024-06-19T15:28:21Z) - Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training [51.87027943520492]
We present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities.
Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities.
arXiv Detail & Related papers (2024-06-10T06:26:03Z) - InstantID: Zero-shot Identity-Preserving Generation in Seconds [21.04236321562671]
We introduce InstantID, a powerful diffusion model-based solution for ID embedding.
Our plug-and-play module adeptly handles image personalization in various styles using just a single facial image.
Our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL.
arXiv Detail & Related papers (2024-01-15T07:50:18Z) - Latent Space Editing in Transformer-Based Flow Matching [53.75073756305241]
Flow Matching with a transformer backbone offers the potential for scalable and high-quality generative modeling.
We introduce an editing space, $u$-space, that can be manipulated in a controllable, accumulative, and composable manner.
Lastly, we put forth a straightforward yet powerful method for achieving fine-grained and nuanced editing using text prompts.
arXiv Detail & Related papers (2023-12-17T21:49:59Z) - PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding [102.07914175196817]
PhotoMaker is an efficient personalized text-to-image generation method.
It encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information.
arXiv Detail & Related papers (2023-12-07T17:32:29Z) - IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image
Diffusion Models [11.105763635691641]
An alternative to text prompt is image prompt, as the saying goes: "an image is worth a thousand words"
We present IP-Adapter, an effective and lightweight adapter to achieve image prompt capability for the pretrained text-to-image diffusion models.
arXiv Detail & Related papers (2023-08-13T08:34:51Z) - SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two
Seconds [88.06788636008051]
Text-to-image diffusion models can create stunning images from natural language descriptions that rival the work of professional artists and photographers.
These models are large, with complex network architectures and tens of denoising iterations, making them computationally expensive and slow to run.
We present a generic approach that unlocks running text-to-image diffusion models on mobile devices in less than $2$ seconds.
arXiv Detail & Related papers (2023-06-01T17:59:25Z) - GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation [143.81719619351335]
Text-to-image (T2I) models based on diffusion processes have achieved remarkable success in controllable image generation using user-provided captions.
The tight coupling between the current text encoder and image decoder in T2I models makes it challenging to replace or upgrade.
We propose GlueGen, which applies a newly proposed GlueNet model to align features from single-modal or multi-modal encoders with the latent space of an existing T2I model.
arXiv Detail & Related papers (2023-03-17T15:37:07Z) - SEGA: Instructing Text-to-Image Models using Semantic Guidance [33.080261792998826]
We show how to interact with the diffusion process to flexibly steer it along semantic directions.
SEGA generalizes to any generative architecture using classifier-free guidance.
It allows for subtle and extensive edits, changes in composition and style, as well as optimizing the overall artistic conception.
arXiv Detail & Related papers (2023-01-28T16:43:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.