Spatially-Adaptive Pixelwise Networks for Fast Image Translation
- URL: http://arxiv.org/abs/2012.02992v1
- Date: Sat, 5 Dec 2020 10:02:03 GMT
- Title: Spatially-Adaptive Pixelwise Networks for Fast Image Translation
- Authors: Tamar Rott Shaham, Michael Gharbi, Richard Zhang, Eli Shechtman, Tomer
Michaeli
- Abstract summary: We introduce a new generator architecture, aimed at fast and efficient high-resolution image-to-image translation.
We use pixel-wise networks; that is, each pixel is processed independently of others.
Our model is up to 18x faster than state-of-the-art baselines.
- Score: 57.359250882770525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce a new generator architecture, aimed at fast and efficient
high-resolution image-to-image translation. We design the generator to be an
extremely lightweight function of the full-resolution image. In fact, we use
pixel-wise networks; that is, each pixel is processed independently of others,
through a composition of simple affine transformations and nonlinearities. We
take three important steps to equip such a seemingly simple function with
adequate expressivity. First, the parameters of the pixel-wise networks are
spatially varying so they can represent a broader function class than simple
1x1 convolutions. Second, these parameters are predicted by a fast
convolutional network that processes an aggressively low-resolution
representation of the input; Third, we augment the input image with a
sinusoidal encoding of spatial coordinates, which provides an effective
inductive bias for generating realistic novel high-frequency image content. As
a result, our model is up to 18x faster than state-of-the-art baselines. We
achieve this speedup while generating comparable visual quality across
different image resolutions and translation domains.
Related papers
- OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts.
Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module.
Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - Efficient Encoding of Graphics Primitives with Simplex-based Structures [0.8158530638728501]
We propose a simplex-based approach for encoding graphics primitives.
In the 2D image fitting task, the proposed method is capable of fitting an image with 9.4% less time compared to the baseline method.
arXiv Detail & Related papers (2023-11-26T21:53:22Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields.
In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting.
Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z) - Single Image Super-Resolution via a Dual Interactive Implicit Neural
Network [5.331665215168209]
We introduce a novel implicit neural network for the task of single image super-resolution at arbitrary scale factors.
We demonstrate the efficacy and flexibility of our approach against the state of the art on publicly available benchmark datasets.
arXiv Detail & Related papers (2022-10-23T02:05:19Z) - DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation [56.514462874501675]
We propose a dynamic sparse attention based Transformer model to achieve fine-level matching with favorable efficiency.
The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on.
Experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details.
arXiv Detail & Related papers (2022-07-13T11:12:03Z) - Parallel Discrete Convolutions on Adaptive Particle Representations of
Images [2.362412515574206]
We present data structures and algorithms for native implementations of discrete convolution operators over Adaptive Particle Representations.
The APR is a content-adaptive image representation that locally adapts the sampling resolution to the image signal.
We show that APR convolution naturally leads to scale-adaptive algorithms that efficiently parallelize on multi-core CPU and GPU architectures.
arXiv Detail & Related papers (2021-12-07T09:40:05Z) - High-Resolution Photorealistic Image Translation in Real-Time: A
Laplacian Pyramid Translation Network [23.981019687483506]
We focus on speeding-up the high-resolution photorealistic I2IT tasks based on closed-form Laplacian pyramid decomposition and reconstruction.
We propose a Laplacian Pyramid Translation Network (N) to simultaneously perform these two tasks.
Our model avoids most of the heavy computation consumed by processing high-resolution feature maps and faithfully preserves the image details.
arXiv Detail & Related papers (2021-05-19T15:05:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.