ArtiFade: Learning to Generate High-quality Subject from Blemished Images
- URL: http://arxiv.org/abs/2409.03745v1
- Date: Thu, 5 Sep 2024 17:57:59 GMT
- Title: ArtiFade: Learning to Generate High-quality Subject from Blemished Images
- Authors: Shuya Yang, Shaozhe Hao, Yukang Cao, Kwan-Yee K. Wong,
- Abstract summary: ArtiFade exploits fine-tuning of a pre-trained text-to-image model, aiming to remove artifacts.
ArtiFade also ensures the preservation of the original generative capabilities inherent within the diffusion model.
- Score: 10.112125529627157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Subject-driven text-to-image generation has witnessed remarkable advancements in its ability to learn and capture characteristics of a subject using only a limited number of images. However, existing methods commonly rely on high-quality images for training and may struggle to generate reasonable images when the input images are blemished by artifacts. This is primarily attributed to the inadequate capability of current techniques in distinguishing subject-related features from disruptive artifacts. In this paper, we introduce ArtiFade to tackle this issue and successfully generate high-quality artifact-free images from blemished datasets. Specifically, ArtiFade exploits fine-tuning of a pre-trained text-to-image model, aiming to remove artifacts. The elimination of artifacts is achieved by utilizing a specialized dataset that encompasses both unblemished images and their corresponding blemished counterparts during fine-tuning. ArtiFade also ensures the preservation of the original generative capabilities inherent within the diffusion model, thereby enhancing the overall performance of subject-driven methods in generating high-quality and artifact-free images. We further devise evaluation benchmarks tailored for this task. Through extensive qualitative and quantitative experiments, we demonstrate the generalizability of ArtiFade in effective artifact removal under both in-distribution and out-of-distribution scenarios.
Related papers
- Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - ENTED: Enhanced Neural Texture Extraction and Distribution for
Reference-based Blind Face Restoration [51.205673783866146]
We present ENTED, a new framework for blind face restoration that aims to restore high-quality and realistic portrait images.
We utilize a texture extraction and distribution framework to transfer high-quality texture features between the degraded input and reference image.
The StyleGAN-like architecture in our framework requires high-quality latent codes to generate realistic images.
arXiv Detail & Related papers (2024-01-13T04:54:59Z) - Learning Subject-Aware Cropping by Outpainting Professional Photos [69.0772948657867]
We propose a weakly-supervised approach to learn what makes a high-quality subject-aware crop from professional stock images.
Our insight is to combine a library of stock images with a modern, pre-trained text-to-image diffusion model.
We are able to automatically generate a large dataset of cropped-uncropped training pairs to train a cropping model.
arXiv Detail & Related papers (2023-12-19T11:57:54Z) - Knowledge-Aware Artifact Image Synthesis with LLM-Enhanced Prompting and
Multi-Source Supervision [5.517240672957627]
We propose a novel knowledge-aware artifact image synthesis approach that brings lost historical objects accurately into their visual forms.
Compared to existing approaches, our proposed model produces higher-quality artifact images that align better with the implicit details and historical knowledge contained within written documents.
arXiv Detail & Related papers (2023-12-13T11:03:07Z) - Perceptual Artifacts Localization for Image Synthesis Tasks [59.638307505334076]
We introduce a novel dataset comprising 10,168 generated images, each annotated with per-pixel perceptual artifact labels.
A segmentation model, trained on our proposed dataset, effectively localizes artifacts across a range of tasks.
We propose an innovative zoom-in inpainting pipeline that seamlessly rectifies perceptual artifacts in the generated images.
arXiv Detail & Related papers (2023-10-09T10:22:08Z) - MagiCapture: High-Resolution Multi-Concept Portrait Customization [34.131515004434846]
MagiCapture is a personalization method for integrating subject and style concepts to generate high-resolution portrait images.
We present a novel Attention Refocusing loss coupled with auxiliary priors, both of which facilitate robust learning within this weakly supervised learning setting.
Our pipeline also includes additional post-processing steps to ensure the creation of highly realistic outputs.
arXiv Detail & Related papers (2023-09-13T11:37:04Z) - Learning to Evaluate the Artness of AI-generated Images [64.48229009396186]
ArtScore is a metric designed to evaluate the degree to which an image resembles authentic artworks by artists.
We employ pre-trained models for photo and artwork generation, resulting in a series of mixed models.
This dataset is then employed to train a neural network that learns to estimate quantized artness levels of arbitrary images.
arXiv Detail & Related papers (2023-05-08T17:58:27Z) - Language Does More Than Describe: On The Lack Of Figurative Speech in
Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts.
These models have been trained using text data collected from content-based labelling protocols.
We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z) - Learning a Single Model with a Wide Range of Quality Factors for JPEG
Image Artifacts Removal [24.25688335628976]
Lossy compression brings artifacts into the compressed image and degrades the visual quality.
In this paper, we propose a highly robust compression artifacts removal network.
Our proposed network is a single model approach that can be trained for handling a wide range of quality factors.
arXiv Detail & Related papers (2020-09-15T08:16:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.