ECSIC: Epipolar Cross Attention for Stereo Image Compression
- URL: http://arxiv.org/abs/2307.10284v2
- Date: Fri, 8 Dec 2023 12:40:20 GMT
- Title: ECSIC: Epipolar Cross Attention for Stereo Image Compression
- Authors: Matthias W\"odlinger, Jan Kotera, Manuel Keglevic, Jan Xu and Robert
Sablatnig
- Abstract summary: ECSIC achieves state-of-the-art performance in stereo image compression on the two popular stereo image datasets Cityscapes and InStereo2k.
- Score: 5.024813922014978
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present ECSIC, a novel learned method for stereo image
compression. Our proposed method compresses the left and right images in a
joint manner by exploiting the mutual information between the images of the
stereo image pair using a novel stereo cross attention (SCA) module and two
stereo context modules. The SCA module performs cross-attention restricted to
the corresponding epipolar lines of the two images and processes them in
parallel. The stereo context modules improve the entropy estimation of the
second encoded image by using the first image as a context. We conduct an
extensive ablation study demonstrating the effectiveness of the proposed
modules and a comprehensive quantitative and qualitative comparison with
existing methods. ECSIC achieves state-of-the-art performance in stereo image
compression on the two popular stereo image datasets Cityscapes and InStereo2k
while allowing for fast encoding and decoding.
Related papers
- A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.
We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.
We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z) - Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model [11.959608742884408]
BiSIC is a symmetric stereo image compression architecture.
We propose a 3D convolution based backbone to capture local features and incorporate bidirectional attention blocks to exploit global features.
Our proposed BiSIC outperforms conventional image/video compression standards.
arXiv Detail & Related papers (2024-07-15T11:36:22Z) - Content-aware Masked Image Modeling Transformer for Stereo Image Compression [15.819672238043786]
We propose a stereo image compression framework, named CAMSIC.
CAMSIC transforms each image to latent representation and employs a powerful decoder-free Transformer entropy model.
Experiments show that our framework achieves state-of-the-art rate-distortion performance on two stereo image datasets.
arXiv Detail & Related papers (2024-03-13T13:12:57Z) - StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models [2.9260206957981167]
We introduce StereoDiffusion, a method that is trainning free, remarkably straightforward to use, and seamlessly integrates into the original Stable Diffusion model.
Our method modifies the latent variable to provide an end-to-end, lightweight capability for fast generation of stereo image pairs.
Our proposed method maintains a high standard of image quality throughout the stereo generation process, achieving state-of-the-art scores in various quantitative evaluations.
arXiv Detail & Related papers (2024-03-08T00:30:25Z) - Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images.
We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process.
Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z) - Active-Passive SimStereo -- Benchmarking the Cross-Generalization
Capabilities of Deep Learning-based Stereo Methods [26.662129158141763]
Self-similar or bland regions can make it difficult to match patches between two images.
Active stereo-based methods mitigate this problem by projecting a pseudo-random pattern on the scene.
If this pattern acts as a form of adversarial noise, it could negatively impact the performance of deep learning-based methods.
arXiv Detail & Related papers (2022-09-17T10:30:32Z) - Neural Distributed Image Compression with Cross-Attention Feature
Alignment [1.2234742322758418]
We consider a pair of stereo images, which have overlapping fields of view, captured by a synchronized and calibrated pair of cameras.
We assume that one image of the pair is to be compressed and transmitted, while the other image is available only at the decoder.
In the proposed architecture, the encoder maps the input image to a latent space using a DNN, quantizes the latent representation, and compresses it losslessly using entropy coding.
arXiv Detail & Related papers (2022-07-18T10:15:04Z) - Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image
Denoising [50.039949798156826]
This paper tackles the challenging problem of hyperspectral (HS) image denoising.
We propose rank-enhanced low-dimensional convolution set (Re-ConvSet)
We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method.
arXiv Detail & Related papers (2022-07-09T13:35:12Z) - COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for
Cross-Modal Retrieval [59.15034487974549]
We propose a novel COllaborative Two-Stream vision-language pretraining model termed COTS for image-text retrieval.
Our COTS achieves the highest performance among all two-stream methods and comparable performance with 10,800X faster in inference.
Importantly, our COTS is also applicable to text-to-video retrieval, yielding new state-ofthe-art on the widely-used MSR-VTT dataset.
arXiv Detail & Related papers (2022-04-15T12:34:47Z) - Stereo Unstructured Magnification: Multiple Homography Image for View
Synthesis [72.09193030350396]
We study the problem of view synthesis with certain amount of rotations from a pair of images, what we called stereo unstructured magnification.
We propose a novel multiple homography image representation, comprising of a set of scene planes with fixed normals and distances.
We derive an angle-based cost to guide the blending of multi-normal images by exploiting per-normal geometry.
arXiv Detail & Related papers (2022-04-01T01:39:28Z) - Parallax Attention for Unsupervised Stereo Correspondence Learning [46.035892564279564]
Stereo image pairs encode 3D scene cues into stereo correspondences between the left and right images.
Recent CNN based methods commonly use cost volume techniques to capture stereo correspondence over large disparities.
We propose a generic parallax-attention mechanism (PAM) to capture stereo correspondence regardless of disparity variations.
arXiv Detail & Related papers (2020-09-16T01:30:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.