RD-Optimized Trit-Plane Coding of Deep Compressed Image Latent Tensors
- URL: http://arxiv.org/abs/2203.13467v1
- Date: Fri, 25 Mar 2022 06:33:16 GMT
- Title: RD-Optimized Trit-Plane Coding of Deep Compressed Image Latent Tensors
- Authors: Seungmin Jeon and Jae-Han Lee and Chang-Su Kim
- Abstract summary: DPICT is the first learning-based image supporting fine granular scalability.
In this paper, we describe how to implement two key components of DPICT efficiently: trit-plane slicing and RD-prioritized transmission.
- Score: 40.86513649546442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: DPICT is the first learning-based image codec supporting fine granular
scalability. In this paper, we describe how to implement two key components of
DPICT efficiently: trit-plane slicing and RD-prioritized transmission. In
DPICT, we transform an image into a latent tensor, represent the tensor in
ternary digits (trits), and encode the trits in the decreasing order of
significance. For entropy encoding, we should compute the probability of each
trit, which demands high time complexity in both the encoder and the decoder.
To reduce the complexity, we develop a parallel computing scheme for the
probabilities and describe it in detail with pseudo-codes. Moreover, in this
paper, we compare the trit-plane slicing in DPICT with the alternative
bit-plane slicing. Experimental results show that the time complexity is
reduced significantly by the parallel computing and that the trit-plane slicing
provides better rate-distortion performances than the bit-plane slicing.
Related papers
- DiTFastAttn: Attention Compression for Diffusion Transformer Models [26.095923502799664]
Diffusion Transformers (DiT) excel at image and video generation but face computational challenges due to self-attention operators.
We propose DiTFastAttn, a post-training compression method to alleviate the computational bottleneck of DiT.
Our results show that for image generation, our method reduces up to 76% of the attention FLOPs and achieves up to 1.8x end-to-end speedup at high-resolution (2k x 2k) generation.
arXiv Detail & Related papers (2024-06-12T18:00:08Z) - Context-Based Trit-Plane Coding for Progressive Image Compression [31.396712329965005]
Trit-plane coding enables deep progressive image compression, but it cannot use autoregressive context models.
We develop the context-based rate reduction module to estimate trit probabilities of latent elements accurately.
Second, we develop the context-based distortion reduction module to refine partial latent tensors from the trit-planes.
Third, we propose a retraining scheme for the decoder to attain better rate-distortion tradeoffs.
arXiv Detail & Related papers (2023-03-10T05:46:25Z) - LIT-Former: Linking In-plane and Through-plane Transformers for
Simultaneous CT Image Denoising and Deblurring [22.605286969419485]
This paper studies 3D low-dose computed tomography (CT) imaging.
Although various deep learning methods were developed in this context, typically they focus on 2D images and perform denoising due to low-dose and deblurring for super-resolution separately.
Up to date, little work was done for simultaneous in-plane denoising and through-plane deblurring, which is important to obtain high-quality 3D CT images with lower radiation and faster imaging speed.
Here, we propose to link in-plane and through-plane transformers for simultaneous in-plane denoising and through-plane
arXiv Detail & Related papers (2023-02-21T12:43:42Z) - UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed.
The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features.
Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z) - Multi-scale Transformer Network with Edge-aware Pre-training for
Cross-Modality MR Image Synthesis [52.41439725865149]
Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones.
Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model.
We propose a Multi-scale Transformer Network (MT-Net) with edge-aware pre-training for cross-modality MR image synthesis.
arXiv Detail & Related papers (2022-12-02T11:40:40Z) - NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction.
The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network.
A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - DPICT: Deep Progressive Image Compression Using Trit-Planes [36.34865777731784]
Deep progressive image compression using trit-planes (DPICT) algorithm.
We transform an image into a latent tensor using an analysis network.
We encode it into a compressed bitstream trit-plane by trit-plane in the decreasing order of significance.
arXiv Detail & Related papers (2021-12-12T22:09:33Z) - Sketching as a Tool for Understanding and Accelerating Self-attention
for Long Sequences [52.6022911513076]
Transformer-based models are not efficient in processing long sequences due to the quadratic space and time complexity of the self-attention modules.
We propose Linformer and Informer to reduce the quadratic complexity to linear (modulo logarithmic factors) via low-dimensional projection and row selection.
Based on the theoretical analysis, we propose Skeinformer to accelerate self-attention and further improve the accuracy of matrix approximation to self-attention.
arXiv Detail & Related papers (2021-12-10T06:58:05Z) - TEASER: Fast and Certifiable Point Cloud Registration [30.19476775410544]
First fast and robust certifiable algorithm for the registration of 3D points in the presence of large amounts of outliers.
Second fast and robust certifiable translation, named TEASER++, uses graduated non-componentity to solve a large subproblem.
arXiv Detail & Related papers (2020-01-21T18:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.