CoPE: Conditional image generation using Polynomial Expansions
- URL: http://arxiv.org/abs/2104.05077v1
- Date: Sun, 11 Apr 2021 19:02:37 GMT
- Title: CoPE: Conditional image generation using Polynomial Expansions
- Authors: Grigorios G Chrysos, Yannis Panagakis
- Abstract summary: We introduce a general framework, called CoPE, that enables an expansion of two input variables and captures their auto- and cross-correlations.
CoPE is evaluated in five tasks (class-generation, inverse problems, edges-to-image translation, image-to-image translation, attribute-guided generation) involving eight datasets.
The thorough evaluation suggests that CoPE can be useful for tackling diverse conditional generation tasks.
- Score: 50.67390290190874
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative modeling has evolved to a notable field of machine learning. Deep
polynomial neural networks (PNNs) have demonstrated impressive results in
unsupervised image generation, where the task is to map an input vector (i.e.,
noise) to a synthesized image. However, the success of PNNs has not been
replicated in conditional generation tasks, such as super-resolution. Existing
PNNs focus on single-variable polynomial expansions which do not fare well to
two-variable inputs, i.e., the noise variable and the conditional variable. In
this work, we introduce a general framework, called CoPE, that enables a
polynomial expansion of two input variables and captures their auto- and
cross-correlations. We exhibit how CoPE can be trivially augmented to accept an
arbitrary number of input variables. CoPE is evaluated in five tasks
(class-conditional generation, inverse problems, edges-to-image translation,
image-to-image translation, attribute-guided generation) involving eight
datasets. The thorough evaluation suggests that CoPE can be useful for tackling
diverse conditional generation tasks.
Related papers
- One Diffusion to Generate Them All [54.82732533013014]
OneDiffusion is a versatile, large-scale diffusion model that supports bidirectional image synthesis and understanding.
It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps.
OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs.
arXiv Detail & Related papers (2024-11-25T12:11:05Z) - Learning Structured Output Representations from Attributes using Deep
Conditional Generative Models [0.0]
This paper recreates the Conditional Variational Auto-encoder architecture and trains it on images conditioned on attributes.
We attempt to generate new faces with distinct attributes such as hair color and glasses, as well as different bird species samples.
arXiv Detail & Related papers (2023-04-30T17:25:31Z) - Polynomial Implicit Neural Representations For Large Diverse Datasets [0.0]
Implicit neural representations (INR) have gained significant popularity for signal and image representation.
Most INR architectures rely on sinusoidal positional encoding, which accounts for high-frequency information in data.
Our approach addresses this gap by representing an image with a function and eliminates the need for positional encodings.
The proposed Poly-INR model performs comparably to state-of-the-art generative models without any convolution, normalization, or self-attention.
arXiv Detail & Related papers (2023-03-20T20:09:46Z) - Optimizing Vision Transformers for Medical Image Segmentation and
Few-Shot Domain Adaptation [11.690799827071606]
We propose Convolutional Swin-Unet (CS-Unet) transformer blocks and optimise their settings with relation to patch embedding, projection, the feed-forward network, up sampling and skip connections.
CS-Unet can be trained from scratch and inherits the superiority of convolutions in each feature process phase.
Experiments show that CS-Unet without pre-training surpasses other state-of-the-art counterparts by large margins on two medical CT and MRI datasets with fewer parameters.
arXiv Detail & Related papers (2022-10-14T19:18:52Z) - Few-Shot Domain Adaptation with Polymorphic Transformers [50.128636842853155]
Deep neural networks (DNNs) trained on one set of medical images often experience severe performance drop on unseen test images.
Few-shot domain adaptation, i.e., adapting a trained model with a handful of annotations, is highly practical and useful in this case.
We propose a Polymorphic Transformer (Polyformer) which can be incorporated into any DNN backbone for few-shot domain adaptation.
arXiv Detail & Related papers (2021-07-10T10:08:57Z) - XCiT: Cross-Covariance Image Transformers [73.33400159139708]
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens.
The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
arXiv Detail & Related papers (2021-06-17T17:33:35Z) - Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image.
We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z) - Deep Polynomial Neural Networks [77.70761658507507]
$Pi$Nets are a new class of function approximators based on expansions.
$Pi$Nets produce state-the-art results in three challenging tasks, i.e. image generation, face verification and 3D mesh representation learning.
arXiv Detail & Related papers (2020-06-20T16:23:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.