Stereo Image Coding for Machines with Joint Visual Feature Compression
- URL: http://arxiv.org/abs/2502.14190v1
- Date: Thu, 20 Feb 2025 01:46:17 GMT
- Title: Stereo Image Coding for Machines with Joint Visual Feature Compression
- Authors: Dengchao Jin, Jianjun Lei, Bo Peng, Zhaoqing Pan, Nam Ling, Qingming Huang,
- Abstract summary: The stereo image coding for machines (SICM) is formulated and explored in this paper.
A machine vision-oriented stereo feature compression network (MVSFC-Net) is proposed for SICM.
The proposed MVSFC-Net obtains superior compression efficiency as well as 3D visual task performance.
- Score: 69.28382442498408
- License:
- Abstract: 2D image coding for machines (ICM) has achieved great success in coding efficiency, while less effort has been devoted to stereo image fields. To promote the efficiency of stereo image compression (SIC) and intelligent analysis, the stereo image coding for machines (SICM) is formulated and explored in this paper. More specifically, a machine vision-oriented stereo feature compression network (MVSFC-Net) is proposed for SICM, where the stereo visual features are effectively extracted, compressed, and transmitted for 3D visual task. To efficiently compress stereo visual features in MVSFC-Net, a stereo multi-scale feature compression (SMFC) module is designed to gradually transform sparse stereo multi-scale features into compact joint visual representations by removing spatial, inter-view, and cross-scale redundancies simultaneously. Experimental results show that the proposed MVSFC-Net obtains superior compression efficiency as well as 3D visual task performance, when compared with the existing ICM anchors recommended by MPEG and the state-of-the-art SIC method.
Related papers
- SQ-GAN: Semantic Image Communications Using Masked Vector Quantization [55.02795214161371]
This work introduces Semantically Masked VQ-GAN (SQ-GAN), a novel approach to optimize image compression for semantic/task-oriented communications.
SQ-GAN employs off-the-shelf semantic semantic segmentation and a new semantic-conditioned adaptive mask module (SAMM) to selectively encode semantically significant features of the images.
arXiv Detail & Related papers (2025-02-13T17:35:57Z) - Lightweight Multiplane Images Network for Real-Time Stereoscopic Conversion from Planar Video [29.199113565852645]
This paper proposes a real-time stereoscopic conversion network based on multi-plane images (MPI)
It employs a lightweight depth-semantic branch to extract depth-aware features implicitly.
It can achieve comparable performance to some state-of-the-art (SOTA) models and support real-time inference at 2K resolution.
arXiv Detail & Related papers (2024-12-04T08:04:14Z) - Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model [11.959608742884408]
BiSIC is a symmetric stereo image compression architecture.
We propose a 3D convolution based backbone to capture local features and incorporate bidirectional attention blocks to exploit global features.
Our proposed BiSIC outperforms conventional image/video compression standards.
arXiv Detail & Related papers (2024-07-15T11:36:22Z) - Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos.
Most of the existing methods lack a structured design to optimally leverage the priors within compression codecs.
A new paradigm is urgently needed for a more conscious'' process of quality enhancement.
arXiv Detail & Related papers (2024-05-10T09:18:17Z) - CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression [15.819672238043786]
We propose a stereo image compression framework, named CAMSIC.
CAMSIC transforms each image to latent representation and employs a powerful decoder-free Transformer entropy model.
Experiments show that our framework achieves state-of-the-art rate-distortion performance.
arXiv Detail & Related papers (2024-03-13T13:12:57Z) - FFCA-Net: Stereo Image Compression via Fast Cascade Alignment of Side
Information [44.88123177525665]
Multi-view compression technology, especially Stereo Image Compression (SIC), plays a crucial role in car-mounted cameras and 3D-related applications.
We propose a Feature-based Fast Cascade Alignment network (FFCA-Net) to fully leverage the side information on the decoder.
Our approach achieves significant gains in terms of 3 to 10-fold faster decoding speed than other methods.
arXiv Detail & Related papers (2023-12-28T11:12:03Z) - ECSIC: Epipolar Cross Attention for Stereo Image Compression [5.024813922014978]
ECSIC achieves state-of-the-art performance in stereo image compression on the two popular stereo image datasets Cityscapes and InStereo2k.
arXiv Detail & Related papers (2023-07-18T11:46:31Z) - Exploring Effective Mask Sampling Modeling for Neural Image Compression [171.35596121939238]
Most existing neural image compression methods rely on side information from hyperprior or context models to eliminate spatial redundancy.
Inspired by the mask sampling modeling in recent self-supervised learning methods for natural language processing and high-level vision, we propose a novel pretraining strategy for neural image compression.
Our method achieves competitive performance with lower computational complexity compared to state-of-the-art image compression methods.
arXiv Detail & Related papers (2023-06-09T06:50:20Z) - Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image
Denoising [50.039949798156826]
This paper tackles the challenging problem of hyperspectral (HS) image denoising.
We propose rank-enhanced low-dimensional convolution set (Re-ConvSet)
We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method.
arXiv Detail & Related papers (2022-07-09T13:35:12Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.