ECSIC: Epipolar Cross Attention for Stereo Image Compression
- URL: http://arxiv.org/abs/2307.10284v2
- Date: Fri, 8 Dec 2023 12:40:20 GMT
- Title: ECSIC: Epipolar Cross Attention for Stereo Image Compression
- Authors: Matthias W\"odlinger, Jan Kotera, Manuel Keglevic, Jan Xu and Robert
Sablatnig
- Abstract summary: ECSIC achieves state-of-the-art performance in stereo image compression on the two popular stereo image datasets Cityscapes and InStereo2k.
- Score: 5.024813922014978
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present ECSIC, a novel learned method for stereo image
compression. Our proposed method compresses the left and right images in a
joint manner by exploiting the mutual information between the images of the
stereo image pair using a novel stereo cross attention (SCA) module and two
stereo context modules. The SCA module performs cross-attention restricted to
the corresponding epipolar lines of the two images and processes them in
parallel. The stereo context modules improve the entropy estimation of the
second encoded image by using the first image as a context. We conduct an
extensive ablation study demonstrating the effectiveness of the proposed
modules and a comprehensive quantitative and qualitative comparison with
existing methods. ECSIC achieves state-of-the-art performance in stereo image
compression on the two popular stereo image datasets Cityscapes and InStereo2k
while allowing for fast encoding and decoding.
Related papers
- Stereo Image Coding for Machines with Joint Visual Feature Compression [69.28382442498408]
The stereo image coding for machines (SICM) is formulated and explored in this paper.
A machine vision-oriented stereo feature compression network (MVSFC-Net) is proposed for SICM.
The proposed MVSFC-Net obtains superior compression efficiency as well as 3D visual task performance.
arXiv Detail & Related papers (2025-02-20T01:46:17Z) - SQ-GAN: Semantic Image Communications Using Masked Vector Quantization [55.02795214161371]
This work introduces Semantically Masked VQ-GAN (SQ-GAN), a novel approach to optimize image compression for semantic/task-oriented communications.
SQ-GAN employs off-the-shelf semantic semantic segmentation and a new semantic-conditioned adaptive mask module (SAMM) to selectively encode semantically significant features of the images.
arXiv Detail & Related papers (2025-02-13T17:35:57Z) - A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.
We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.
We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model [11.959608742884408]
BiSIC is a symmetric stereo image compression architecture.
We propose a 3D convolution based backbone to capture local features and incorporate bidirectional attention blocks to exploit global features.
Our proposed BiSIC outperforms conventional image/video compression standards.
arXiv Detail & Related papers (2024-07-15T11:36:22Z) - CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression [15.819672238043786]
We propose a stereo image compression framework, named CAMSIC.
CAMSIC transforms each image to latent representation and employs a powerful decoder-free Transformer entropy model.
Experiments show that our framework achieves state-of-the-art rate-distortion performance.
arXiv Detail & Related papers (2024-03-13T13:12:57Z) - StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models [2.9260206957981167]
We introduce StereoDiffusion, a method that is trainning free, remarkably straightforward to use, and seamlessly integrates into the original Stable Diffusion model.
Our method modifies the latent variable to provide an end-to-end, lightweight capability for fast generation of stereo image pairs.
Our proposed method maintains a high standard of image quality throughout the stereo generation process, achieving state-of-the-art scores in various quantitative evaluations.
arXiv Detail & Related papers (2024-03-08T00:30:25Z) - Neural Distributed Image Compression with Cross-Attention Feature
Alignment [1.2234742322758418]
We consider a pair of stereo images, which have overlapping fields of view, captured by a synchronized and calibrated pair of cameras.
We assume that one image of the pair is to be compressed and transmitted, while the other image is available only at the decoder.
In the proposed architecture, the encoder maps the input image to a latent space using a DNN, quantizes the latent representation, and compresses it losslessly using entropy coding.
arXiv Detail & Related papers (2022-07-18T10:15:04Z) - Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image
Denoising [50.039949798156826]
This paper tackles the challenging problem of hyperspectral (HS) image denoising.
We propose rank-enhanced low-dimensional convolution set (Re-ConvSet)
We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method.
arXiv Detail & Related papers (2022-07-09T13:35:12Z) - COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for
Cross-Modal Retrieval [59.15034487974549]
We propose a novel COllaborative Two-Stream vision-language pretraining model termed COTS for image-text retrieval.
Our COTS achieves the highest performance among all two-stream methods and comparable performance with 10,800X faster in inference.
Importantly, our COTS is also applicable to text-to-video retrieval, yielding new state-ofthe-art on the widely-used MSR-VTT dataset.
arXiv Detail & Related papers (2022-04-15T12:34:47Z) - Parallax Attention for Unsupervised Stereo Correspondence Learning [46.035892564279564]
Stereo image pairs encode 3D scene cues into stereo correspondences between the left and right images.
Recent CNN based methods commonly use cost volume techniques to capture stereo correspondence over large disparities.
We propose a generic parallax-attention mechanism (PAM) to capture stereo correspondence regardless of disparity variations.
arXiv Detail & Related papers (2020-09-16T01:30:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.