Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models
- URL: http://arxiv.org/abs/2311.17919v2
- Date: Tue, 2 Apr 2024 21:34:29 GMT
- Title: Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models
- Authors: Daniel Geng, Inbum Park, Andrew Owens,
- Abstract summary: Multi-view optical illusions are images that change appearance upon a transformation, such as a flip or rotation.
We propose a zero-shot method for obtaining these illusions from off-the-shelf text-to-image diffusion models.
We provide both qualitative and quantitative results demonstrating the effectiveness and flexibility of our method.
- Score: 15.977340635967018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of synthesizing multi-view optical illusions: images that change appearance upon a transformation, such as a flip or rotation. We propose a simple, zero-shot method for obtaining these illusions from off-the-shelf text-to-image diffusion models. During the reverse diffusion process, we estimate the noise from different views of a noisy image, and then combine these noise estimates together and denoise the image. A theoretical analysis suggests that this method works precisely for views that can be written as orthogonal transformations, of which permutations are a subset. This leads to the idea of a visual anagram--an image that changes appearance under some rearrangement of pixels. This includes rotations and flips, but also more exotic pixel permutations such as a jigsaw rearrangement. Our approach also naturally extends to illusions with more than two views. We provide both qualitative and quantitative results demonstrating the effectiveness and flexibility of our method. Please see our project webpage for additional visualizations and results: https://dangeng.github.io/visual_anagrams/
Related papers
- SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow [94.90853153808987]
We propose a unified diffusion-based framework (SemFlow) for semantic segmentation and semantic image synthesis.
As the training object is symmetric, samples belonging to the two distributions, images and semantic masks, can be effortlessly transferred reversibly.
Experiments show that our SemFlow achieves competitive results on semantic segmentation and semantic image synthesis tasks.
arXiv Detail & Related papers (2024-05-30T17:34:40Z) - Factorized Diffusion: Perceptual Illusions by Noise Decomposition [15.977340635967018]
We present a zero-shot method to control each individual component through diffusion model sampling.
For certain decompositions, our method recovers prior approaches to compositional generation and spatial control.
We show that we can extend our approach to generate hybrid images from real images.
arXiv Detail & Related papers (2024-04-17T17:59:59Z) - Diffusion Illusions: Hiding Images in Plain Sight [37.87050866208039]
Diffusion Illusions is the first comprehensive pipeline designed to automatically generate a wide range of illusions.
We study three types of illusions, each where the prime images are arranged in different ways.
We conduct comprehensive experiments on these illusions and verify the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-12-06T18:59:18Z) - Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via
Stochastic Differential Equations without Training [46.75803514327477]
Exemplar-based sketch-to-photo synthesis allows users to generate photo-realistic images based on sketches.
generating photo-realistic images with color and texture from sketch images remains challenging for diffusion models.
We propose a two-stage method named Inversion-by-Inversion" for exemplar-based sketch-to-photo synthesis.
arXiv Detail & Related papers (2023-08-15T09:27:57Z) - Uncovering the Disentanglement Capability in Text-to-Image Diffusion
Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes.
We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation.
Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z) - Solving Visual Analogies Using Neural Algorithmic Reasoning [22.384921045720752]
We search for a sequence of elementary neural network transformations that manipulate distributed representations derived from a symbolic space.
We evaluate the extent to which our neural reasoning' approach generalizes for images with unseen shapes and positions.
arXiv Detail & Related papers (2021-11-19T18:48:16Z) - Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose.
Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification.
We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z) - Cross-View Image Synthesis with Deformable Convolution and Attention
Mechanism [29.528402825356398]
We propose to use Generative Adversarial Networks(GANs) based on a deformable convolution and attention mechanism to solve the problem of cross-view image synthesis.
It is difficult to understand and transform scenes appearance and semantic information from another view, thus we use deformed convolution in the U-net network to improve the network's ability to extract features of objects at different scales.
arXiv Detail & Related papers (2020-07-20T03:08:36Z) - NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis [78.5281048849446]
We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes.
Our algorithm represents a scene using a fully-connected (non-convolutional) deep network.
Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses.
arXiv Detail & Related papers (2020-03-19T17:57:23Z) - Watching the World Go By: Representation Learning from Unlabeled Videos [78.22211989028585]
Recent single image unsupervised representation learning techniques show remarkable success on a variety of tasks.
In this paper, we argue that videos offer this natural augmentation for free.
We propose Video Noise Contrastive Estimation, a method for using unlabeled video to learn strong, transferable single image representations.
arXiv Detail & Related papers (2020-03-18T00:07:21Z) - Self-Supervised Linear Motion Deblurring [112.75317069916579]
Deep convolutional neural networks are state-of-the-art for image deblurring.
We present a differentiable reblur model for self-supervised motion deblurring.
Our experiments demonstrate that self-supervised single image deblurring is really feasible.
arXiv Detail & Related papers (2020-02-10T20:15:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.