Related papers: Palette: Image-to-Image Diffusion Models

Palette: Image-to-Image Diffusion Models

URL: http://arxiv.org/abs/2111.05826v1
Date: Wed, 10 Nov 2021 17:49:29 GMT
Title: Palette: Image-to-Image Diffusion Models
Authors: Chitwan Saharia, William Chan, Huiwen Chang, Chris A. Lee, Jonathan Ho, Tim Salimans, David J. Fleet, Mohammad Norouzi
Abstract summary: We introduce Palette, a simple and general framework for image-to-image translation using conditional diffusion models. On four challenging image-to-image translation tasks, Palette outperforms strong GAN and regression baselines. We report several sample quality scores including FID, Inception Score, Classification Accuracy of a pre-trained ResNet-50, and Perceptual Distance against reference images.
Score: 50.268441533631176
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce Palette, a simple and general framework for image-to-image translation using conditional diffusion models. On four challenging image-to-image translation tasks (colorization, inpainting, uncropping, and JPEG decompression), Palette outperforms strong GAN and regression baselines, and establishes a new state of the art. This is accomplished without task-specific hyper-parameter tuning, architecture customization, or any auxiliary loss, demonstrating a desirable degree of generality and flexibility. We uncover the impact of using $L_2$ vs. $L_1$ loss in the denoising diffusion objective on sample diversity, and demonstrate the importance of self-attention through empirical architecture studies. Importantly, we advocate a unified evaluation protocol based on ImageNet, and report several sample quality scores including FID, Inception Score, Classification Accuracy of a pre-trained ResNet-50, and Perceptual Distance against reference images for various baselines. We expect this standardized evaluation protocol to play a critical role in advancing image-to-image translation research. Finally, we show that a single generalist Palette model trained on 3 tasks (colorization, inpainting, JPEG decompression) performs as well or better than task-specific specialist counterparts.

Related papers

BPCLIP: A Bottom-up Image Quality Assessment from Distortion to Semantics Based on CLIP [18.25854559825818]
We propose a bottom-up image quality assessment approach based on the Contrastive Language-Image Pre-training (CLIP)<n>Specifically, we utilize an encoder to extract multiscale features from the input image and introduce a bottom-up multiscale cross attention module.<n>By incorporating 40 image quality adjectives across six distinct dimensions, we enable the pre-trained CLIP text encoder to generate representations of the intrinsic quality of the image.
arXiv Detail & Related papers (2025-06-22T09:56:57Z)
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion [61.90969199199739]
BrushNet is a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM. BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.
arXiv Detail & Related papers (2024-03-11T17:59:31Z)
Benchmark Generation Framework with Customizable Distortions for Image Classifier Robustness [4.339574774938128]
We present a novel framework for generating adversarial benchmarks to evaluate the robustness of image classification models. Our framework allows users to customize the types of distortions to be optimally applied to images, which helps address the specific distortions relevant to their deployment.
arXiv Detail & Related papers (2023-10-28T07:40:42Z)
Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval [11.696941841000985]
Two-stage methods following retrieve-and-rerank paradigm have achieved excellent performance, but their separate local and global modules are inefficient to real-world applications. We propose a mechanism which attentively selects prominent local descriptors and infuse fine-grained semantic relations into the global representation. Our method achieves state-of-the-art single-stage image retrieval performance on benchmarks such as Revisited Oxford and Revisited Paris.
arXiv Detail & Related papers (2023-08-08T03:06:10Z)
Wavelet-based Unsupervised Label-to-Image Translation [9.339522647331334]
We propose a new Unsupervised paradigm for SIS (USIS) that makes use of a self-supervised segmentation loss and whole image wavelet based discrimination. We test our methodology on 3 challenging datasets and demonstrate its ability to bridge the performance gap between paired and unpaired models.
arXiv Detail & Related papers (2023-05-16T17:48:44Z)
Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language. We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z)
FewGAN: Generating from the Joint Distribution of a Few Images [95.6635227371479]
We introduce FewGAN, a generative model for generating novel, high-quality and diverse images. FewGAN is a hierarchical patch-GAN that applies quantization at the first coarse scale, followed by a pyramid of residual fully convolutional GANs at finer scales. In an extensive set of experiments, it is shown that FewGAN outperforms baselines both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-07-18T07:11:28Z)
High-Quality Pluralistic Image Completion via Code Shared VQGAN [51.7805154545948]
We present a novel framework for pluralistic image completion that can achieve both high quality and diversity at much faster inference speed. Our framework is able to learn semantically-rich discrete codes efficiently and robustly, resulting in much better image reconstruction quality.
arXiv Detail & Related papers (2022-04-05T01:47:35Z)
Unsupervised Layered Image Decomposition into Object Prototypes [39.20333694585477]
We present an unsupervised learning framework for decomposing images into layers of automatically discovered object models. We first validate our approach by providing results on par with the state of the art on standard multi-object synthetic benchmarks. We then demonstrate the applicability of our model to real images in tasks that include clustering (SVHN, GTSRB), cosegmentation (Weizmann Horse) and object discovery from unfiltered social network images.
arXiv Detail & Related papers (2021-04-29T18:02:01Z)
A Hierarchical Transformation-Discriminating Generative Model for Few Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image. The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.