Pseudocylindrical Convolutions for Learned Omnidirectional Image
Compression
- URL: http://arxiv.org/abs/2112.13227v1
- Date: Sat, 25 Dec 2021 12:18:32 GMT
- Title: Pseudocylindrical Convolutions for Learned Omnidirectional Image
Compression
- Authors: Mu Li, Kede Ma, Jinxing Li, and David Zhang
- Abstract summary: We make one of the first attempts to learn deep neural networks for omnidirectional image compression.
Under reasonable constraints on the parametric representation, the pseudocylindrical convolution can be efficiently implemented by standard convolution.
Experimental results show that our method consistently achieves better rate-distortion performance than competing methods.
- Score: 42.15877732557837
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although equirectangular projection (ERP) is a convenient form to store
omnidirectional images (also known as 360-degree images), it is neither
equal-area nor conformal, thus not friendly to subsequent visual communication.
In the context of image compression, ERP will over-sample and deform things and
stuff near the poles, making it difficult for perceptually optimal bit
allocation. In conventional 360-degree image compression, techniques such as
region-wise packing and tiled representation are introduced to alleviate the
over-sampling problem, achieving limited success. In this paper, we make one of
the first attempts to learn deep neural networks for omnidirectional image
compression. We first describe parametric pseudocylindrical representation as a
generalization of common pseudocylindrical map projections. A computationally
tractable greedy method is presented to determine the (sub)-optimal
configuration of the pseudocylindrical representation in terms of a novel proxy
objective for rate-distortion performance. We then propose pseudocylindrical
convolutions for 360-degree image compression. Under reasonable constraints on
the parametric representation, the pseudocylindrical convolution can be
efficiently implemented by standard convolution with the so-called
pseudocylindrical padding. To demonstrate the feasibility of our idea, we
implement an end-to-end 360-degree image compression system, consisting of the
learned pseudocylindrical representation, an analysis transform, a non-uniform
quantizer, a synthesis transform, and an entropy model. Experimental results on
$19,790$ omnidirectional images show that our method achieves consistently
better rate-distortion performance than the competing methods. Moreover, the
visual quality by our method is significantly improved for all images at all
bitrates.
Related papers
- SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception [61.7243424157871]
We introduce a transformer-based architecture that, by incorporating a novel Spherical Local Self-Attention'' and other spherically-oriented modules, successfully operates in the spherical domain and outperforms the state-of-the-art in 360$degree$ perception benchmarks for depth estimation and semantic segmentation.
arXiv Detail & Related papers (2024-12-09T20:23:10Z) - Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model [11.959608742884408]
BiSIC is a symmetric stereo image compression architecture.
We propose a 3D convolution based backbone to capture local features and incorporate bidirectional attention blocks to exploit global features.
Our proposed BiSIC outperforms conventional image/video compression standards.
arXiv Detail & Related papers (2024-07-15T11:36:22Z) - PSC: Posterior Sampling-Based Compression [34.50287066865267]
Posterior Sampling-based Compression (PSC) is a zero-shot compression method that leverages a pre-trained diffusion model as its sole neural network component.
PSC constructs a transform that is adaptive to the image.
We demonstrate that PSC's performance is comparable to established training-based methods in terms of rate, distortion, and perceptual quality.
arXiv Detail & Related papers (2024-07-13T14:24:22Z) - Hybrid Model-based / Data-driven Graph Transform for Image Coding [54.31406300524195]
We present a hybrid model-based / data-driven approach to encode an intra-prediction residual block.
The first $K$ eigenvectors of a transform matrix are derived from a statistical model, e.g., the asymmetric discrete sine transform (ADST) for stability.
Using WebP as a baseline image, experimental results show that our hybrid graph transform achieved better energy compaction than default discrete cosine transform (DCT) and better stability than KLT.
arXiv Detail & Related papers (2022-03-02T15:36:44Z) - Rectifying homographies for stereo vision: analytical solution for
minimal distortion [0.0]
Rectification is used to simplify the subsequent stereo correspondence problem.
This work proposes a closed-form solution for the rectifying homographies that minimise perspective distortion.
arXiv Detail & Related papers (2022-02-28T22:35:47Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - OSLO: On-the-Sphere Learning for Omnidirectional images and its
application to 360-degree image compression [59.58879331876508]
We study the learning of representation models for omnidirectional images and propose to use the properties of HEALPix uniform sampling of the sphere to redefine the mathematical tools used in deep learning models for omnidirectional images.
Our proposed on-the-sphere solution leads to a better compression gain that can save 13.7% of the bit rate compared to similar learned models applied to equirectangular images.
arXiv Detail & Related papers (2021-07-19T22:14:30Z) - Substitutional Neural Image Compression [48.20906717052056]
Substitutional Neural Image Compression (SNIC) is a general approach for enhancing any neural image compression model.
It boosts compression performance toward a flexible distortion metric and enables bit-rate control using a single model instance.
arXiv Detail & Related papers (2021-05-16T20:53:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.