Guaranteed Conditional Diffusion: 3D Block-based Models for Scientific Data Compression
- URL: http://arxiv.org/abs/2502.12951v1
- Date: Tue, 18 Feb 2025 15:33:09 GMT
- Title: Guaranteed Conditional Diffusion: 3D Block-based Models for Scientific Data Compression
- Authors: Jaemoon Lee, Xiao Li, Liangji Zhu, Sanjay Ranka, Anand Rangarajan,
- Abstract summary: This paper proposes a new compression paradigm -- Guaranteed Diffusion with Conditional Correction (GCDTC)
It consists of a conditional diffusion model, tensor correction, and error guarantee.
Our framework outperforms standard convolutional autoencoders and yields competitive compression quality with an existing scientific data compression algorithm.
- Score: 10.848192105624848
- License:
- Abstract: This paper proposes a new compression paradigm -- Guaranteed Conditional Diffusion with Tensor Correction (GCDTC) -- for lossy scientific data compression. The framework is based on recent conditional diffusion (CD) generative models, and it consists of a conditional diffusion model, tensor correction, and error guarantee. Our diffusion model is a mixture of 3D conditioning and 2D denoising U-Net. The approach leverages a 3D block-based compressing module to address spatiotemporal correlations in structured scientific data. Then, the reverse diffusion process for 2D spatial data is conditioned on the ``slices'' of content latent variables produced by the compressing module. After training, the denoising decoder reconstructs the data with zero noise and content latent variables, and thus it is entirely deterministic. The reconstructed outputs of the CD model are further post-processed by our tensor correction and error guarantee steps to control and ensure a maximum error distortion, which is an inevitable requirement in lossy scientific data compression. Our experiments involving two datasets generated by climate and chemical combustion simulations show that our framework outperforms standard convolutional autoencoders and yields competitive compression quality with an existing scientific data compression algorithm.
Related papers
- Foundation Model for Lossy Compression of Spatiotemporal Scientific Data [11.494915987840876]
We present a foundation model (FM) for lossy scientific data compression.
We combine a variational autoencoder (E) with a hyper-prior structure and a super-resolution (SR) module.
arXiv Detail & Related papers (2024-12-22T22:57:08Z) - Rectified Diffusion Guidance for Conditional Generation [62.00207951161297]
We revisit the theory behind CFG and rigorously confirm that the improper configuration of the combination coefficients (i.e., the widely used summing-to-one version) brings about expectation shift of the generative distribution.
We propose ReCFG with a relaxation on the guidance coefficients such that denoising with ReCFG strictly aligns with the diffusion theory.
That way the rectified coefficients can be readily pre-computed via traversing the observed data, leaving the sampling speed barely affected.
arXiv Detail & Related papers (2024-10-24T13:41:32Z) - MoDeGPT: Modular Decomposition for Large Language Model Compression [59.361006801465344]
This paper introduces textbfModular bfDecomposition (MoDeGPT), a novel structured compression framework.
MoDeGPT partitions the Transformer block into modules comprised of matrix pairs and reduces the hidden dimensions.
Our experiments show MoDeGPT, without backward propagation, matches or surpasses previous structured compression methods.
arXiv Detail & Related papers (2024-08-19T01:30:14Z) - Machine Learning Techniques for Data Reduction of CFD Applications [10.881548113461493]
We present an approach called guaranteed block autoencoder that leverages Correlations for reducing scientific results.
It uses a multidimensional block of tensors (CFD) for both input and output.
arXiv Detail & Related papers (2024-04-28T04:01:09Z) - Correcting Diffusion-Based Perceptual Image Compression with Privileged End-to-End Decoder [49.01721042973929]
This paper presents a diffusion-based image compression method that employs a privileged end-to-end decoder model as correction.
Experiments demonstrate the superiority of our method in both distortion and perception compared with previous perceptual compression methods.
arXiv Detail & Related papers (2024-04-07T10:57:54Z) - Compression of Structured Data with Autoencoders: Provable Benefit of
Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture.
We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z) - Activations and Gradients Compression for Model-Parallel Training [85.99744701008802]
We study how simultaneous compression of activations and gradients in model-parallel distributed training setup affects convergence.
We find that gradients require milder compression rates than activations.
Experiments also show that models trained with TopK perform well only when compression is also applied during inference.
arXiv Detail & Related papers (2024-01-15T15:54:54Z) - Lossy Image Compression with Conditional Diffusion Models [25.158390422252097]
This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models.
In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model.
Our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics.
arXiv Detail & Related papers (2022-09-14T21:53:27Z) - A Physics-Informed Vector Quantized Autoencoder for Data Compression of
Turbulent Flow [28.992515947961593]
We apply a physics-informed Deep Learning technique based on vector quantization to generate a low-dimensional representation of data from turbulent flows.
The accuracy of the model is assessed using statistical, comparison-based similarity and physics-based metrics.
Our model can offer CR $=85$ with a mean square error (MSE) of $O(10-3)$, and predictions that faithfully reproduce the statistics of the flow, except at the very smallest scales.
arXiv Detail & Related papers (2022-01-10T19:55:50Z) - Exploring Autoencoder-based Error-bounded Compression for Scientific
Data [14.724393511470225]
We develop an error-bounded autoencoder-based framework in terms of the SZ model.
We optimize the compression quality for the main stages in our designed AE-based error-bounded compression framework.
arXiv Detail & Related papers (2021-05-25T07:53:32Z) - MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models [78.93424358827528]
We present a novel compression algorithm for reducing the storage streams of LiDAR sensor data.
Our method significantly reduces the joint geometry and intensity over prior state-of-the-art LiDAR compression methods.
arXiv Detail & Related papers (2020-11-15T17:41:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.