ECSIC: Epipolar Cross Attention for Stereo Image Compression
        - URL: http://arxiv.org/abs/2307.10284v2
- Date: Fri, 8 Dec 2023 12:40:20 GMT
- Title: ECSIC: Epipolar Cross Attention for Stereo Image Compression
- Authors: Matthias W\"odlinger, Jan Kotera, Manuel Keglevic, Jan Xu and Robert
  Sablatnig
- Abstract summary: ECSIC achieves state-of-the-art performance in stereo image compression on the two popular stereo image datasets Cityscapes and InStereo2k.
- Score: 5.024813922014978
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   In this paper, we present ECSIC, a novel learned method for stereo image
compression. Our proposed method compresses the left and right images in a
joint manner by exploiting the mutual information between the images of the
stereo image pair using a novel stereo cross attention (SCA) module and two
stereo context modules. The SCA module performs cross-attention restricted to
the corresponding epipolar lines of the two images and processes them in
parallel. The stereo context modules improve the entropy estimation of the
second encoded image by using the first image as a context. We conduct an
extensive ablation study demonstrating the effectiveness of the proposed
modules and a comprehensive quantitative and qualitative comparison with
existing methods. ECSIC achieves state-of-the-art performance in stereo image
compression on the two popular stereo image datasets Cityscapes and InStereo2k
while allowing for fast encoding and decoding.
 
      
        Related papers
        - StereoINR: Cross-View Geometry Consistent Stereo Super Resolution with   Implicit Neural Representation [15.167871410210353]
 Stereo image super-resolution (SSR) aims to enhance high-resolution details by leveraging information from stereo image pairs.<n>Previous upsampling methods use convolution to independently process deep features of different views, lacking cross-view and non-local information perception.<n>We propose Stereo Implicit Neural Representation (StereoINR), which innovatively models stereo image pairs as continuous implicit representations.<n>This continuous representation breaks through the scale limitations, providing a unified solution for arbitrary-scale stereo super-resolution reconstruction of left-right views.
 arXiv  Detail & Related papers  (2025-05-07T08:30:45Z)
- FD-LSCIC: Frequency Decomposition-based Learned Screen Content Image   Compression [67.34466255300339]
 This paper addresses three key challenges in SC image compression: learning compact latent features, adapting quantization step sizes, and the lack of large SC datasets.
We introduce an adaptive quantization module that learns scaled uniform noise for each frequency component, enabling flexible control over quantization granularity.
We construct a large SC image compression dataset (SDU-SCICD10K), which includes over 10,000 images spanning basic SC images, computer-rendered images, and mixed NS and SC images from both PC and mobile platforms.
 arXiv  Detail & Related papers  (2025-02-21T03:15:16Z)
- Stereo Image Coding for Machines with Joint Visual Feature Compression [69.28382442498408]
 The stereo image coding for machines (SICM) is formulated and explored in this paper.
A machine vision-oriented stereo feature compression network (MVSFC-Net) is proposed for SICM.
The proposed MVSFC-Net obtains superior compression efficiency as well as 3D visual task performance.
 arXiv  Detail & Related papers  (2025-02-20T01:46:17Z)
- A Global Depth-Range-Free Multi-View Stereo Transformer Network with   Pose Embedding [76.44979557843367]
 We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior.
We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information.
We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
 arXiv  Detail & Related papers  (2024-11-04T08:50:16Z)
- Bidirectional Stereo Image Compression with Cross-Dimensional Entropy   Model [11.959608742884408]
 BiSIC is a symmetric stereo image compression architecture.
We propose a 3D convolution based backbone to capture local features and incorporate bidirectional attention blocks to exploit global features.
Our proposed BiSIC outperforms conventional image/video compression standards.
 arXiv  Detail & Related papers  (2024-07-15T11:36:22Z)
- Content-aware Masked Image Modeling Transformer for Stereo Image   Compression [15.819672238043786]
 We propose a stereo image compression framework, named CAMSIC.
 CAMSIC transforms each image to latent representation and employs a powerful decoder-free Transformer entropy model.
 Experiments show that our framework achieves state-of-the-art rate-distortion performance on two stereo image datasets.
 arXiv  Detail & Related papers  (2024-03-13T13:12:57Z)
- StereoDiffusion: Training-Free Stereo Image Generation Using Latent   Diffusion Models [2.9260206957981167]
 We introduce StereoDiffusion, a method that is trainning free, remarkably straightforward to use, and seamlessly integrates into the original Stable Diffusion model.
Our method modifies the latent variable to provide an end-to-end, lightweight capability for fast generation of stereo image pairs.
Our proposed method maintains a high standard of image quality throughout the stereo generation process, achieving state-of-the-art scores in various quantitative evaluations.
 arXiv  Detail & Related papers  (2024-03-08T00:30:25Z)
- Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
 We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images.
We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process.
Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
 arXiv  Detail & Related papers  (2023-11-06T18:33:24Z)
- Active-Passive SimStereo -- Benchmarking the Cross-Generalization
  Capabilities of Deep Learning-based Stereo Methods [26.662129158141763]
 Self-similar or bland regions can make it difficult to match patches between two images.
Active stereo-based methods mitigate this problem by projecting a pseudo-random pattern on the scene.
If this pattern acts as a form of adversarial noise, it could negatively impact the performance of deep learning-based methods.
 arXiv  Detail & Related papers  (2022-09-17T10:30:32Z)
- Neural Distributed Image Compression with Cross-Attention Feature
  Alignment [1.2234742322758418]
 We consider a pair of stereo images, which have overlapping fields of view, captured by a synchronized and calibrated pair of cameras.
We assume that one image of the pair is to be compressed and transmitted, while the other image is available only at the decoder.
In the proposed architecture, the encoder maps the input image to a latent space using a DNN, quantizes the latent representation, and compresses it losslessly using entropy coding.
 arXiv  Detail & Related papers  (2022-07-18T10:15:04Z)
- Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image
  Denoising [50.039949798156826]
 This paper tackles the challenging problem of hyperspectral (HS) image denoising.
We propose rank-enhanced low-dimensional convolution set (Re-ConvSet)
We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method.
 arXiv  Detail & Related papers  (2022-07-09T13:35:12Z)
- COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for
  Cross-Modal Retrieval [59.15034487974549]
 We propose a novel COllaborative Two-Stream vision-language pretraining model termed COTS for image-text retrieval.
Our COTS achieves the highest performance among all two-stream methods and comparable performance with 10,800X faster in inference.
 Importantly, our COTS is also applicable to text-to-video retrieval, yielding new state-ofthe-art on the widely-used MSR-VTT dataset.
 arXiv  Detail & Related papers  (2022-04-15T12:34:47Z)
- Stereo Unstructured Magnification: Multiple Homography Image for View
  Synthesis [72.09193030350396]
 We study the problem of view synthesis with certain amount of rotations from a pair of images, what we called stereo unstructured magnification.
We propose a novel multiple homography image representation, comprising of a set of scene planes with fixed normals and distances.
We derive an angle-based cost to guide the blending of multi-normal images by exploiting per-normal geometry.
 arXiv  Detail & Related papers  (2022-04-01T01:39:28Z)
- Parallax Attention for Unsupervised Stereo Correspondence Learning [46.035892564279564]
 Stereo image pairs encode 3D scene cues into stereo correspondences between the left and right images.
Recent CNN based methods commonly use cost volume techniques to capture stereo correspondence over large disparities.
We propose a generic parallax-attention mechanism (PAM) to capture stereo correspondence regardless of disparity variations.
 arXiv  Detail & Related papers  (2020-09-16T01:30:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.