Neural Distributed Image Compression with Cross-Attention Feature
Alignment
- URL: http://arxiv.org/abs/2207.08489v1
- Date: Mon, 18 Jul 2022 10:15:04 GMT
- Title: Neural Distributed Image Compression with Cross-Attention Feature
Alignment
- Authors: Nitish Mital, Ezgi Ozyilkan, Ali Garjani, Deniz Gunduz
- Abstract summary: We consider a pair of stereo images, which have overlapping fields of view, captured by a synchronized and calibrated pair of cameras.
We assume that one image of the pair is to be compressed and transmitted, while the other image is available only at the decoder.
In the proposed architecture, the encoder maps the input image to a latent space using a DNN, quantizes the latent representation, and compresses it losslessly using entropy coding.
- Score: 1.2234742322758418
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel deep neural network (DNN) architecture for compressing an
image when a correlated image is available as side information only at the
decoder side, a special case of the well-known and heavily studied distributed
source coding (DSC) problem. In particular, we consider a pair of stereo
images, which have overlapping fields of view, captured by a synchronized and
calibrated pair of cameras; and therefore, are highly correlated. We assume
that one image of the pair is to be compressed and transmitted, while the other
image is available only at the decoder. In the proposed architecture, the
encoder maps the input image to a latent space using a DNN, quantizes the
latent representation, and compresses it losslessly using entropy coding. The
proposed decoder extracts useful information common between the images solely
from the available side information, as well as a latent representation of the
side information. Then, the latent representations of the two images, one
received from the encoder, the other extracted locally, along with the locally
generated common information, are fed to the respective decoders of the two
images. We employ a cross-attention module (CAM) to align the feature maps
obtained in the intermediate layers of the respective decoders of the two
images, thus allowing better utilization of the side information. We train and
demonstrate the effectiveness of the proposed algorithm on various realistic
setups, such as KITTI and Cityscape datasets of stereo image pairs. Our results
show that the proposed architecture is capable of exploiting the decoder-only
side information in a more efficient manner as it outperforms previous works.
We also show that the proposed method is able to provide significant gains even
in the case of uncalibrated and unsynchronized camera array use cases.
Related papers
- Content-aware Masked Image Modeling Transformer for Stereo Image Compression [15.819672238043786]
We propose a stereo image compression framework, named CAMSIC.
CAMSIC transforms each image to latent representation and employs a powerful decoder-free Transformer entropy model.
Experiments show that our framework achieves state-of-the-art rate-distortion performance on two stereo image datasets.
arXiv Detail & Related papers (2024-03-13T13:12:57Z) - Neuromorphic Synergy for Video Binarization [54.195375576583864]
Bimodal objects serve as a visual form to embed information that can be easily recognized by vision systems.
Neuromorphic cameras offer new capabilities for alleviating motion blur, but it is non-trivial to first de-blur and then binarize the images in a real-time manner.
We propose an event-based binary reconstruction method that leverages the prior knowledge of the bimodal target's properties to perform inference independently in both event space and image space.
We also develop an efficient integration method to propagate this binary image to high frame rate binary video.
arXiv Detail & Related papers (2024-02-20T01:43:51Z) - Triple-View Knowledge Distillation for Semi-Supervised Semantic
Segmentation [54.23510028456082]
We propose a Triple-view Knowledge Distillation framework, termed TriKD, for semi-supervised semantic segmentation.
The framework includes the triple-view encoder and the dual-frequency decoder.
arXiv Detail & Related papers (2023-09-22T01:02:21Z) - A Multi-Stream Fusion Network for Image Splicing Localization [18.505512386111985]
We propose an encoder-decoder architecture that consists of multiple encoder streams.
Each stream is fed with either the tampered image or handcrafted signals and processes them separately to capture relevant information from each one independently.
The extracted features from the multiple streams are fused in the bottleneck of the architecture and propagated to the decoder network that generates the output localization map.
arXiv Detail & Related papers (2022-12-02T12:17:53Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - A New Image Codec Paradigm for Human and Machine Uses [53.48873918537017]
A new scalable image paradigm for both human and machine uses is proposed in this work.
The high-level instance segmentation map and the low-level signal features are extracted with neural networks.
An image is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features.
arXiv Detail & Related papers (2021-12-19T06:17:38Z) - Small Lesion Segmentation in Brain MRIs with Subpixel Embedding [105.1223735549524]
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues.
We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network.
arXiv Detail & Related papers (2021-09-18T00:21:17Z) - Deep Stereo Image Compression with Decoder Side Information using Wyner
Common Information [1.5293427903448022]
We consider a pair of stereo images, which generally have high correlation with each other due to overlapping fields of view, and assume that one image of the pair is to be compressed and transmitted.
In the proposed architecture, the encoder maps the input image to a latent space, quantizes the latent representation, and compresses it using entropy coding.
The decoder is trained to extract the Wyner's common information between the input image and the correlated image from the latter.
arXiv Detail & Related papers (2021-06-22T12:46:31Z) - Two-stream Encoder-Decoder Network for Localizing Image Forgeries [4.982505311411925]
We propose a novel two-stream encoder-decoder network, which utilizes both the high-level and the low-level image features.
We have carried out experimental analysis on multiple standard forensics datasets to evaluate the performance of the proposed method.
arXiv Detail & Related papers (2020-09-27T15:49:17Z) - Wireless Image Retrieval at the Edge [20.45405359815043]
We study the image retrieval problem at the wireless edge, where an edge device captures an image, which is then used to retrieve similar images from an edge server.
Our goal is to maximize the accuracy of the retrieval task under power and bandwidth constraints over the wireless link.
We propose two alternative schemes based on digital and analog communications, respectively.
arXiv Detail & Related papers (2020-07-21T16:15:40Z) - Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images.
We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.