Related papers: Neural Distributed Image Compression with Cross-Attention Feature Alignment

Neural Distributed Image Compression with Cross-Attention Feature Alignment

URL: http://arxiv.org/abs/2207.08489v1
Date: Mon, 18 Jul 2022 10:15:04 GMT
Title: Neural Distributed Image Compression with Cross-Attention Feature Alignment
Authors: Nitish Mital, Ezgi Ozyilkan, Ali Garjani, Deniz Gunduz
Abstract summary: We consider a pair of stereo images, which have overlapping fields of view, captured by a synchronized and calibrated pair of cameras. We assume that one image of the pair is to be compressed and transmitted, while the other image is available only at the decoder. In the proposed architecture, the encoder maps the input image to a latent space using a DNN, quantizes the latent representation, and compresses it losslessly using entropy coding.
Score: 1.2234742322758418
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a novel deep neural network (DNN) architecture for compressing an image when a correlated image is available as side information only at the decoder side, a special case of the well-known and heavily studied distributed source coding (DSC) problem. In particular, we consider a pair of stereo images, which have overlapping fields of view, captured by a synchronized and calibrated pair of cameras; and therefore, are highly correlated. We assume that one image of the pair is to be compressed and transmitted, while the other image is available only at the decoder. In the proposed architecture, the encoder maps the input image to a latent space using a DNN, quantizes the latent representation, and compresses it losslessly using entropy coding. The proposed decoder extracts useful information common between the images solely from the available side information, as well as a latent representation of the side information. Then, the latent representations of the two images, one received from the encoder, the other extracted locally, along with the locally generated common information, are fed to the respective decoders of the two images. We employ a cross-attention module (CAM) to align the feature maps obtained in the intermediate layers of the respective decoders of the two images, thus allowing better utilization of the side information. We train and demonstrate the effectiveness of the proposed algorithm on various realistic setups, such as KITTI and Cityscape datasets of stereo image pairs. Our results show that the proposed architecture is capable of exploiting the decoder-only side information in a more efficient manner as it outperforms previous works. We also show that the proposed method is able to provide significant gains even in the case of uncalibrated and unsynchronized camera array use cases.

Related papers

FE-LWS: Refined Image-Text Representations via Decoder Stacking and Fused Encodings for Remote Sensing Image Captioning [0.15346678870160887]
This paper introduces a novel approach that integrates features from two distinct CNN based encoders. We also propose a weighted averaging technique to combine the outputs of all GRUs in the stacked decoder. The results demonstrate that our fusion-based approach, along with the enhanced stacked decoder, significantly outperforms both the transformer-based state-of-the-art model and other LSTM-based baselines.
arXiv Detail & Related papers (2025-02-13T12:54:13Z)
Content-aware Masked Image Modeling Transformer for Stereo Image Compression [15.819672238043786]
We propose a stereo image compression framework, named CAMSIC. CAMSIC transforms each image to latent representation and employs a powerful decoder-free Transformer entropy model. Experiments show that our framework achieves state-of-the-art rate-distortion performance on two stereo image datasets.
arXiv Detail & Related papers (2024-03-13T13:12:57Z)
Neuromorphic Synergy for Video Binarization [54.195375576583864]
Bimodal objects serve as a visual form to embed information that can be easily recognized by vision systems. Neuromorphic cameras offer new capabilities for alleviating motion blur, but it is non-trivial to first de-blur and then binarize the images in a real-time manner. We propose an event-based binary reconstruction method that leverages the prior knowledge of the bimodal target's properties to perform inference independently in both event space and image space. We also develop an efficient integration method to propagate this binary image to high frame rate binary video.
arXiv Detail & Related papers (2024-02-20T01:43:51Z)
Triple-View Knowledge Distillation for Semi-Supervised Semantic Segmentation [54.23510028456082]
We propose a Triple-view Knowledge Distillation framework, termed TriKD, for semi-supervised semantic segmentation. The framework includes the triple-view encoder and the dual-frequency decoder.
arXiv Detail & Related papers (2023-09-22T01:02:21Z)
A Multi-Stream Fusion Network for Image Splicing Localization [18.505512386111985]
We propose an encoder-decoder architecture that consists of multiple encoder streams. Each stream is fed with either the tampered image or handcrafted signals and processes them separately to capture relevant information from each one independently. The extracted features from the multiple streams are fused in the bottleneck of the architecture and propagated to the decoder network that generates the output localization map.
arXiv Detail & Related papers (2022-12-02T12:17:53Z)
Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks. We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation. We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z)
A New Image Codec Paradigm for Human and Machine Uses [53.48873918537017]
A new scalable image paradigm for both human and machine uses is proposed in this work. The high-level instance segmentation map and the low-level signal features are extracted with neural networks. An image is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features.
arXiv Detail & Related papers (2021-12-19T06:17:38Z)
Small Lesion Segmentation in Brain MRIs with Subpixel Embedding [105.1223735549524]
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues. We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network.
arXiv Detail & Related papers (2021-09-18T00:21:17Z)
Deep Stereo Image Compression with Decoder Side Information using Wyner Common Information [1.5293427903448022]
We consider a pair of stereo images, which generally have high correlation with each other due to overlapping fields of view, and assume that one image of the pair is to be compressed and transmitted. In the proposed architecture, the encoder maps the input image to a latent space, quantizes the latent representation, and compresses it using entropy coding. The decoder is trained to extract the Wyner's common information between the input image and the correlated image from the latter.
arXiv Detail & Related papers (2021-06-22T12:46:31Z)
Two-stream Encoder-Decoder Network for Localizing Image Forgeries [4.982505311411925]
We propose a novel two-stream encoder-decoder network, which utilizes both the high-level and the low-level image features. We have carried out experimental analysis on multiple standard forensics datasets to evaluate the performance of the proposed method.
arXiv Detail & Related papers (2020-09-27T15:49:17Z)
Wireless Image Retrieval at the Edge [20.45405359815043]
We study the image retrieval problem at the wireless edge, where an edge device captures an image, which is then used to retrieve similar images from an edge server. Our goal is to maximize the accuracy of the retrieval task under power and bandwidth constraints over the wireless link. We propose two alternative schemes based on digital and analog communications, respectively.
arXiv Detail & Related papers (2020-07-21T16:15:40Z)
Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images. We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.