Related papers: Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model

Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model

URL: http://arxiv.org/abs/2407.10632v2
Date: Sat, 26 Oct 2024 06:03:35 GMT
Title: Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model
Authors: Zhening Liu, Xinjie Zhang, Jiawei Shao, Zehong Lin, Jun Zhang,
Abstract summary: BiSIC is a symmetric stereo image compression architecture. We propose a 3D convolution based backbone to capture local features and incorporate bidirectional attention blocks to exploit global features. Our proposed BiSIC outperforms conventional image/video compression standards.
Score: 11.959608742884408
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the rapid advancement of stereo vision technologies, stereo image compression has emerged as a crucial field that continues to draw significant attention. Previous approaches have primarily employed a unidirectional paradigm, where the compression of one view is dependent on the other, resulting in imbalanced compression. To address this issue, we introduce a symmetric bidirectional stereo image compression architecture, named BiSIC. Specifically, we propose a 3D convolution based codec backbone to capture local features and incorporate bidirectional attention blocks to exploit global features. Moreover, we design a novel cross-dimensional entropy model that integrates various conditioning factors, including the spatial context, channel context, and stereo dependency, to effectively estimate the distribution of latent representations for entropy coding. Extensive experiments demonstrate that our proposed BiSIC outperforms conventional image/video compression standards, as well as state-of-the-art learning-based methods, in terms of both PSNR and MS-SSIM.

Related papers

CSMCIR: CoT-Enhanced Symmetric Alignment with Memory Bank for Composed Image Retrieval [54.15776146365823]
Composed Image Retrieval (CIR) enables users to search for target images using both a reference image and manipulation text.<n>We propose CSMCIR, a unified representation framework that achieves efficient query-target alignment through three synergistic components.
arXiv Detail & Related papers (2026-01-07T09:21:38Z)
SHaDe: Compact and Consistent Dynamic 3D Reconstruction via Tri-Plane Deformation and Latent Diffusion [0.0]
We present a novel framework for dynamic 3D scene reconstruction that integrates three key components.<n>An explicit tri-plane deformation field, a view-conditioned canonical field with spherical harmonics (SH) attention, and a temporally-aware latent diffusion prior.<n>Our method encodes 4D scenes using three 2D feature planes that evolve over time, enabling efficient compact representation.
arXiv Detail & Related papers (2025-05-22T11:25:38Z)
Neural Stereo Video Compression with Hybrid Disparity Compensation [21.811261355652356]
We propose a hybrid disparity compensation (HDC) strategy that leverages explicit pixel displacement as a robust prior feature to simplify optimization. We introduce a novel end-to-end optimized neural stereo video compression framework, which integrates HDC-based modules into key coding operations.
arXiv Detail & Related papers (2025-04-29T03:04:09Z)
Hierarchical Semantic Compression for Consistent Image Semantic Restoration [62.97519327310638]
We propose a novel hierarchical semantic compression (HSC) framework that purely operates within intrinsic semantic spaces from generative models. Experimental results demonstrate that the proposed HSC framework achieves the state-of-the-art performance on subjective quality and consistency for human vision.
arXiv Detail & Related papers (2025-02-24T03:20:44Z)
FD-LSCIC: Frequency Decomposition-based Learned Screen Content Image Compression [67.34466255300339]
This paper addresses three key challenges in SC image compression: learning compact latent features, adapting quantization step sizes, and the lack of large SC datasets. We introduce an adaptive quantization module that learns scaled uniform noise for each frequency component, enabling flexible control over quantization granularity. We construct a large SC image compression dataset (SDU-SCICD10K), which includes over 10,000 images spanning basic SC images, computer-rendered images, and mixed NS and SC images from both PC and mobile platforms.
arXiv Detail & Related papers (2025-02-21T03:15:16Z)
Stereo Image Coding for Machines with Joint Visual Feature Compression [69.28382442498408]
The stereo image coding for machines (SICM) is formulated and explored in this paper. A machine vision-oriented stereo feature compression network (MVSFC-Net) is proposed for SICM. The proposed MVSFC-Net obtains superior compression efficiency as well as 3D visual task performance.
arXiv Detail & Related papers (2025-02-20T01:46:17Z)
Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos. Most of the existing methods lack a structured design to optimally leverage the priors within compression codecs. A new paradigm is urgently needed for a more conscious'' process of quality enhancement.
arXiv Detail & Related papers (2024-05-10T09:18:17Z)
Content-aware Masked Image Modeling Transformer for Stereo Image Compression [15.819672238043786]
We propose a stereo image compression framework, named CAMSIC. CAMSIC transforms each image to latent representation and employs a powerful decoder-free Transformer entropy model. Experiments show that our framework achieves state-of-the-art rate-distortion performance on two stereo image datasets.
arXiv Detail & Related papers (2024-03-13T13:12:57Z)
Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image Compression [63.56922682378755]
We focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding. The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform. Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.
arXiv Detail & Related papers (2023-08-17T01:34:51Z)
ECSIC: Epipolar Cross Attention for Stereo Image Compression [5.024813922014978]
ECSIC achieves state-of-the-art performance in stereo image compression on the two popular stereo image datasets Cityscapes and InStereo2k.
arXiv Detail & Related papers (2023-07-18T11:46:31Z)
Learned Video Compression via Heterogeneous Deformable Compensation Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance. More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets. Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z)
CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning. The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z)
OSLO: On-the-Sphere Learning for Omnidirectional images and its application to 360-degree image compression [59.58879331876508]
We study the learning of representation models for omnidirectional images and propose to use the properties of HEALPix uniform sampling of the sphere to redefine the mathematical tools used in deep learning models for omnidirectional images. Our proposed on-the-sphere solution leads to a better compression gain that can save 13.7% of the bit rate compared to similar learned models applied to equirectangular images.
arXiv Detail & Related papers (2021-07-19T22:14:30Z)
Causal Contextual Prediction for Learned Image Compression [36.08393281509613]
We propose the concept of separate entropy coding to leverage a serial decoding process for causal contextual entropy prediction in the latent space. A causal context model is proposed that separates the latents across channels and makes use of cross-channel relationships to generate highly informative contexts. We also propose a causal global prediction model, which is able to find global reference points for accurate predictions of unknown points.
arXiv Detail & Related papers (2020-11-19T08:15:10Z)
Improving Inference for Neural Image Compression [31.999462074510305]
State-of-the-art methods build on hierarchical variational autoencoders to predict a compressible latent representation of each data point. We identify three approximation gaps which limit performance in the conventional approach. We propose remedies for each of these three limitations based on ideas related to iterative inference.
arXiv Detail & Related papers (2020-06-07T19:26:37Z)
Learning Context-Based Non-local Entropy Modeling for Image Compression [140.64888994506313]
In this paper, we propose a non-local operation for context modeling by employing the global similarity within the context. The entropy model is further adopted as the rate loss in a joint rate-distortion optimization. Considering that the width of the transforms is essential in training low distortion models, we finally produce a U-Net block in the transforms to increase the width with manageable memory consumption and time complexity.
arXiv Detail & Related papers (2020-05-10T13:28:18Z)
Generalized Octave Convolutions for Learned Multi-Frequency Image Compression [20.504561050200365]
We propose the first learned multi-frequency image compression and entropy coding approach. It is based on the recently developed octave convolutions to factorize the latents into high and low frequency (resolution) components. We show that the proposed generalized octave convolution can improve the performance of other auto-encoder-based computer vision tasks.
arXiv Detail & Related papers (2020-02-24T01:35:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.