Related papers: ECSIC: Epipolar Cross Attention for Stereo Image Compression

Related papers

StereoINR: Cross-View Geometry Consistent Stereo Super Resolution with Implicit Neural Representation [15.167871410210353]
Stereo image super-resolution (SSR) aims to enhance high-resolution details by leveraging information from stereo image pairs.<n>Previous upsampling methods use convolution to independently process deep features of different views, lacking cross-view and non-local information perception.<n>We propose Stereo Implicit Neural Representation (StereoINR), which innovatively models stereo image pairs as continuous implicit representations.<n>This continuous representation breaks through the scale limitations, providing a unified solution for arbitrary-scale stereo super-resolution reconstruction of left-right views.
arXiv Detail & Related papers (2025-05-07T08:30:45Z)
FD-LSCIC: Frequency Decomposition-based Learned Screen Content Image Compression [67.34466255300339]
This paper addresses three key challenges in SC image compression: learning compact latent features, adapting quantization step sizes, and the lack of large SC datasets. We introduce an adaptive quantization module that learns scaled uniform noise for each frequency component, enabling flexible control over quantization granularity. We construct a large SC image compression dataset (SDU-SCICD10K), which includes over 10,000 images spanning basic SC images, computer-rendered images, and mixed NS and SC images from both PC and mobile platforms.
arXiv Detail & Related papers (2025-02-21T03:15:16Z)
Stereo Image Coding for Machines with Joint Visual Feature Compression [69.28382442498408]
The stereo image coding for machines (SICM) is formulated and explored in this paper. A machine vision-oriented stereo feature compression network (MVSFC-Net) is proposed for SICM. The proposed MVSFC-Net obtains superior compression efficiency as well as 3D visual task performance.
arXiv Detail & Related papers (2025-02-20T01:46:17Z)
A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding [76.44979557843367]
We propose a novel multi-view stereo (MVS) framework that gets rid of the depth range prior. We introduce a Multi-view Disparity Attention (MDA) module to aggregate long-range context information. We explicitly estimate the quality of the current pixel corresponding to sampled points on the epipolar line of the source image.
arXiv Detail & Related papers (2024-11-04T08:50:16Z)
Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model [11.959608742884408]
BiSIC is a symmetric stereo image compression architecture. We propose a 3D convolution based backbone to capture local features and incorporate bidirectional attention blocks to exploit global features. Our proposed BiSIC outperforms conventional image/video compression standards.
arXiv Detail & Related papers (2024-07-15T11:36:22Z)
Content-aware Masked Image Modeling Transformer for Stereo Image Compression [15.819672238043786]
We propose a stereo image compression framework, named CAMSIC. CAMSIC transforms each image to latent representation and employs a powerful decoder-free Transformer entropy model. Experiments show that our framework achieves state-of-the-art rate-distortion performance on two stereo image datasets.
arXiv Detail & Related papers (2024-03-13T13:12:57Z)
StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models [2.9260206957981167]
We introduce StereoDiffusion, a method that is trainning free, remarkably straightforward to use, and seamlessly integrates into the original Stable Diffusion model. Our method modifies the latent variable to provide an end-to-end, lightweight capability for fast generation of stereo image pairs. Our proposed method maintains a high standard of image quality throughout the stereo generation process, achieving state-of-the-art scores in various quantitative evaluations.
arXiv Detail & Related papers (2024-03-08T00:30:25Z)
Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images. We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process. Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z)
Active-Passive SimStereo -- Benchmarking the Cross-Generalization Capabilities of Deep Learning-based Stereo Methods [26.662129158141763]
Self-similar or bland regions can make it difficult to match patches between two images. Active stereo-based methods mitigate this problem by projecting a pseudo-random pattern on the scene. If this pattern acts as a form of adversarial noise, it could negatively impact the performance of deep learning-based methods.
arXiv Detail & Related papers (2022-09-17T10:30:32Z)
Neural Distributed Image Compression with Cross-Attention Feature Alignment [1.2234742322758418]
We consider a pair of stereo images, which have overlapping fields of view, captured by a synchronized and calibrated pair of cameras. We assume that one image of the pair is to be compressed and transmitted, while the other image is available only at the decoder. In the proposed architecture, the encoder maps the input image to a latent space using a DNN, quantizes the latent representation, and compresses it losslessly using entropy coding.
arXiv Detail & Related papers (2022-07-18T10:15:04Z)
Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image Denoising [50.039949798156826]
This paper tackles the challenging problem of hyperspectral (HS) image denoising. We propose rank-enhanced low-dimensional convolution set (Re-ConvSet) We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method.
arXiv Detail & Related papers (2022-07-09T13:35:12Z)
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval [59.15034487974549]
We propose a novel COllaborative Two-Stream vision-language pretraining model termed COTS for image-text retrieval. Our COTS achieves the highest performance among all two-stream methods and comparable performance with 10,800X faster in inference. Importantly, our COTS is also applicable to text-to-video retrieval, yielding new state-ofthe-art on the widely-used MSR-VTT dataset.
arXiv Detail & Related papers (2022-04-15T12:34:47Z)
Stereo Unstructured Magnification: Multiple Homography Image for View Synthesis [72.09193030350396]
We study the problem of view synthesis with certain amount of rotations from a pair of images, what we called stereo unstructured magnification. We propose a novel multiple homography image representation, comprising of a set of scene planes with fixed normals and distances. We derive an angle-based cost to guide the blending of multi-normal images by exploiting per-normal geometry.
arXiv Detail & Related papers (2022-04-01T01:39:28Z)
Parallax Attention for Unsupervised Stereo Correspondence Learning [46.035892564279564]
Stereo image pairs encode 3D scene cues into stereo correspondences between the left and right images. Recent CNN based methods commonly use cost volume techniques to capture stereo correspondence over large disparities. We propose a generic parallax-attention mechanism (PAM) to capture stereo correspondence regardless of disparity variations.
arXiv Detail & Related papers (2020-09-16T01:30:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.