GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data
- URL: http://arxiv.org/abs/2404.13470v1
- Date: Sat, 20 Apr 2024 21:12:53 GMT
- Title: GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data
- Authors: Wenqi Jia, Sian Jin, Jinzhen Wang, Wei Niu, Dingwen Tao, Miao Yin,
- Abstract summary: We propose GWLZ, a novel group-wise learning-based lossy compression framework with multiple lightweight learnable enhancer models.
We show that GWLZ significantly enhances the decompressed data reconstruction quality with negligible impact on the compression efficiency.
- Score: 14.92764869276237
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid expansion of computational capabilities and the ever-growing scale of modern HPC systems present formidable challenges in managing exascale scientific data. Faced with such vast datasets, traditional lossless compression techniques prove insufficient in reducing data size to a manageable level while preserving all information intact. In response, researchers have turned to error-bounded lossy compression methods, which offer a balance between data size reduction and information retention. However, despite their utility, these compressors employing conventional techniques struggle with limited reconstruction quality. To address this issue, we draw inspiration from recent advancements in deep learning and propose GWLZ, a novel group-wise learning-based lossy compression framework with multiple lightweight learnable enhancer models. Leveraging a group of neural networks, GWLZ significantly enhances the decompressed data reconstruction quality with negligible impact on the compression efficiency. Experimental results on different fields from the Nyx dataset demonstrate remarkable improvements by GWLZ, achieving up to 20% quality enhancements with negligible overhead as low as 0.0003x.
Related papers
- Compressing high-resolution data through latent representation encoding for downscaling large-scale AI weather forecast model [10.634513279883913]
We propose a variational autoencoder framework tailored for compressing high-resolution datasets.
Our framework successfully reduced the storage size of 3 years of HRCLDAS data from 8.61 TB to just 204 GB, while preserving essential information.
arXiv Detail & Related papers (2024-10-10T05:38:03Z) - NeurLZ: On Enhancing Lossy Compression Performance based on Error-Controlled Neural Learning for Scientific Data [35.36879818366783]
Large-scale scientific simulations generate massive datasets that pose challenges for storage and I/O.
We propose NeurLZ, a novel cross-field learning-based and error-controlled compression framework for scientific data.
arXiv Detail & Related papers (2024-09-09T16:48:09Z) - Convolutional variational autoencoders for secure lossy image compression in remote sensing [47.75904906342974]
This study investigates image compression based on convolutional variational autoencoders (CVAE)
CVAEs have been demonstrated to outperform conventional compression methods such as JPEG2000 by a substantial margin on compression benchmark datasets.
arXiv Detail & Related papers (2024-04-03T15:17:29Z) - Understanding The Effectiveness of Lossy Compression in Machine Learning Training Sets [7.261516807130813]
Learning and Artificial Intelligence (ML/AI) techniques have become increasingly prevalent in high performance computing.
Data compression can be a solution to these problems, but an in-depth understanding of how lossy compression affects model quality is needed.
We show modern lossy compression methods can achieve a 50-100x compression ratio improvement for a 1% or less loss in quality.
arXiv Detail & Related papers (2024-03-23T23:14:37Z) - Semantic Ensemble Loss and Latent Refinement for High-Fidelity Neural Image Compression [58.618625678054826]
This study presents an enhanced neural compression method designed for optimal visual fidelity.
We have trained our model with a sophisticated semantic ensemble loss, integrating Charbonnier loss, perceptual loss, style loss, and a non-binary adversarial loss.
Our empirical findings demonstrate that this approach significantly improves the statistical fidelity of neural image compression.
arXiv Detail & Related papers (2024-01-25T08:11:27Z) - SRN-SZ: Deep Leaning-Based Scientific Error-bounded Lossy Compression
with Super-resolution Neural Networks [13.706955134941385]
We propose SRN-SZ, a deep learning-based scientific error-bounded lossy compressor.
SRN-SZ applies the most advanced super-resolution network HAT for its compression.
In experiments, SRN-SZ achieves up to 75% compression ratio improvements under the same error bound.
arXiv Detail & Related papers (2023-09-07T22:15:32Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - ScaleCom: Scalable Sparsified Gradient Compression for
Communication-Efficient Distributed Training [74.43625662170284]
Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained.
We propose a new compression technique that leverages similarity in the gradient distribution amongst learners to provide significantly improved scalability.
We experimentally demonstrate that ScaleCom has small overheads, directly reduces gradient traffic and provides high compression rates (65-400X) and excellent scalability (up to 64 learners and 8-12X larger batch sizes over standard training) without significant accuracy loss.
arXiv Detail & Related papers (2021-04-21T02:22:10Z) - An Efficient Statistical-based Gradient Compression Technique for
Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC)
Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z) - Analyzing and Mitigating JPEG Compression Defects in Deep Learning [69.04777875711646]
We present a unified study of the effects of JPEG compression on a range of common tasks and datasets.
We show that there is a significant penalty on common performance metrics for high compression.
arXiv Detail & Related papers (2020-11-17T20:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.