Locally Masked Convolution for Autoregressive Models
- URL: http://arxiv.org/abs/2006.12486v3
- Date: Sat, 27 Jun 2020 04:53:14 GMT
- Title: Locally Masked Convolution for Autoregressive Models
- Authors: Ajay Jain and Pieter Abbeel and Deepak Pathak
- Abstract summary: LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image.
We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
- Score: 107.4635841204146
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: High-dimensional generative models have many applications including image
compression, multimedia generation, anomaly detection and data completion.
State-of-the-art estimators for natural images are autoregressive, decomposing
the joint distribution over pixels into a product of conditionals parameterized
by a deep neural network, e.g. a convolutional neural network such as the
PixelCNN. However, PixelCNNs only model a single decomposition of the joint,
and only a single generation order is efficient. For tasks such as image
completion, these models are unable to use much of the observed context. To
generate data in arbitrary orders, we introduce LMConv: a simple modification
to the standard 2D convolution that allows arbitrary masks to be applied to the
weights at each location in the image. Using LMConv, we learn an ensemble of
distribution estimators that share parameters but differ in generation order,
achieving improved performance on whole-image density estimation (2.89 bpd on
unconditional CIFAR10), as well as globally coherent image completions. Our
code is available at https://ajayjain.github.io/lmconv.
Related papers
- Image-GS: Content-Adaptive Image Representation via 2D Gaussians [55.15950594752051]
We propose Image-GS, a content-adaptive image representation.
Using anisotropic 2D Gaussians as the basis, Image-GS shows high memory efficiency, supports fast random access, and offers a natural level of detail stack.
General efficiency and fidelity of Image-GS are validated against several recent neural image representations and industry-standard texture compressors.
We hope this research offers insights for developing new applications that require adaptive quality and resource control, such as machine perception, asset streaming, and content generation.
arXiv Detail & Related papers (2024-07-02T00:45:21Z) - Mixing Histopathology Prototypes into Robust Slide-Level Representations
for Cancer Subtyping [19.577541771516124]
Whole-slide image analysis via the means of computational pathology often relies on processing tessellated gigapixel images with only slide-level labels available.
Applying multiple instance learning-based methods or transformer models is computationally expensive as each image, all instances have to be processed simultaneously.
TheMixer is an under-explored alternative model to common vision transformers, especially for large-scale datasets.
arXiv Detail & Related papers (2023-10-19T14:15:20Z) - Designing BERT for Convolutional Networks: Sparse and Hierarchical
Masked Modeling [23.164631160130092]
We extend the success of BERT-style pre-training, or the masked image modeling, to convolutional networks (convnets)
We treat unmasked pixels as sparse voxels of 3D point clouds and use sparse convolution to encode.
This is the first use of sparse convolution for 2D masked modeling.
arXiv Detail & Related papers (2023-01-09T18:59:50Z) - Traditional Classification Neural Networks are Good Generators: They are
Competitive with DDPMs and GANs [104.72108627191041]
We show that conventional neural network classifiers can generate high-quality images comparable to state-of-the-art generative models.
We propose a mask-based reconstruction module to make semantic gradients-aware to synthesize plausible images.
We show that our method is also applicable to text-to-image generation by regarding image-text foundation models.
arXiv Detail & Related papers (2022-11-27T11:25:35Z) - FewGAN: Generating from the Joint Distribution of a Few Images [95.6635227371479]
We introduce FewGAN, a generative model for generating novel, high-quality and diverse images.
FewGAN is a hierarchical patch-GAN that applies quantization at the first coarse scale, followed by a pyramid of residual fully convolutional GANs at finer scales.
In an extensive set of experiments, it is shown that FewGAN outperforms baselines both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-07-18T07:11:28Z) - Class Balanced PixelNet for Neurological Image Segmentation [20.56747443955369]
We propose an automatic brain tumor segmentation approach (e.g., PixelNet) using a pixel-level convolutional neural network (CNN)
The proposed model has achieved promising results in brain tumor and ischemic stroke segmentation datasets.
arXiv Detail & Related papers (2022-04-23T10:57:54Z) - PixelPyramids: Exact Inference Models from Lossless Image Pyramids [58.949070311990916]
Pixel-Pyramids is a block-autoregressive approach with scale-specific representations to encode the joint distribution of image pixels.
It yields state-of-the-art results for density estimation on various image datasets, especially for high-resolution data.
For CelebA-HQ 1024 x 1024, we observe that the density estimates are improved to 44% of the baseline despite sampling speeds superior even to easily parallelizable flow-based models.
arXiv Detail & Related papers (2021-10-17T10:47:29Z) - Bayesian Image Reconstruction using Deep Generative Models [7.012708932320081]
In this work, we leverage state-of-the-art (SOTA) generative models for building powerful image priors.
Our method, called Bayesian Reconstruction through Generative Models (BRGM), uses a single pre-trained generator model to solve different image restoration tasks.
arXiv Detail & Related papers (2020-12-08T17:11:26Z) - Adversarial Generation of Continuous Images [31.92891885615843]
In this paper, we propose two novel architectural techniques for building INR-based image decoders.
We use them to build a state-of-the-art continuous image GAN.
Our proposed INR-GAN architecture improves the performance of continuous image generators by several times.
arXiv Detail & Related papers (2020-11-24T11:06:40Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.