Scalable Hybrid Learning Techniques for Scientific Data Compression
- URL: http://arxiv.org/abs/2212.10733v1
- Date: Wed, 21 Dec 2022 03:00:18 GMT
- Title: Scalable Hybrid Learning Techniques for Scientific Data Compression
- Authors: Tania Banerjee, Jong Choi, Jaemoon Lee, Qian Gong, Jieyang Chen, Scott
Klasky, Anand Rangarajan, Sanjay Ranka
- Abstract summary: Scientists require compression techniques that accurately preserve derived quantities of interest (QoIs)
This paper presents a physics-informed compression technique implemented as an end-to-end, scalable, GPU-based pipeline for data compression.
- Score: 6.803722400888276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data compression is becoming critical for storing scientific data because
many scientific applications need to store large amounts of data and post
process this data for scientific discovery. Unlike image and video compression
algorithms that limit errors to primary data, scientists require compression
techniques that accurately preserve derived quantities of interest (QoIs). This
paper presents a physics-informed compression technique implemented as an
end-to-end, scalable, GPU-based pipeline for data compression that addresses
this requirement. Our hybrid compression technique combines machine learning
techniques and standard compression methods. Specifically, we combine an
autoencoder, an error-bounded lossy compressor to provide guarantees on raw
data error, and a constraint satisfaction post-processing step to preserve the
QoIs within a minimal error (generally less than floating point error).
The effectiveness of the data compression pipeline is demonstrated by
compressing nuclear fusion simulation data generated by a large-scale fusion
code, XGC, which produces hundreds of terabytes of data in a single day. Our
approach works within the ADIOS framework and results in compression by a
factor of more than 150 while requiring only a few percent of the computational
resources necessary for generating the data, making the overall approach highly
effective for practical scenarios.
Related papers
- Lightweight Correlation-Aware Table Compression [58.50312417249682]
$texttVirtual$ is a framework that integrates seamlessly with existing open formats.
Experiments on data-gov datasets show that $texttVirtual$ reduces file sizes by up to 40% compared to Apache Parquet.
arXiv Detail & Related papers (2024-10-17T22:28:07Z) - Enhancing Lossy Compression Through Cross-Field Information for Scientific Applications [11.025583805165455]
Lossy compression is one of the most effective methods for reducing the size of scientific data containing multiple data fields.
Previous approaches use local information from a single target field when predicting target data points, limiting their potential to achieve higher compression ratios.
We propose a novel hybrid prediction model that utilizes CNN to extract cross-field information and combine it with existing local field information.
arXiv Detail & Related papers (2024-09-26T21:06:53Z) - Sparse $L^1$-Autoencoders for Scientific Data Compression [0.0]
We introduce effective data compression methods by developing autoencoders using high dimensional latent spaces that are $L1$-regularized.
We show how these information-rich latent spaces can be used to mitigate blurring and other artifacts to obtain highly effective data compression methods for scientific data.
arXiv Detail & Related papers (2024-05-23T07:48:00Z) - Compression of Structured Data with Autoencoders: Provable Benefit of
Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture.
We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z) - SRN-SZ: Deep Leaning-Based Scientific Error-bounded Lossy Compression
with Super-resolution Neural Networks [13.706955134941385]
We propose SRN-SZ, a deep learning-based scientific error-bounded lossy compressor.
SRN-SZ applies the most advanced super-resolution network HAT for its compression.
In experiments, SRN-SZ achieves up to 75% compression ratio improvements under the same error bound.
arXiv Detail & Related papers (2023-09-07T22:15:32Z) - Hierarchical Autoencoder-based Lossy Compression for Large-scale High-resolution Scientific Data [12.831138965071945]
This work presents a neural network that significantly compresses large-scale scientific data, but also maintains high reconstruction quality.
The proposed model is tested with scientific benchmark data available publicly and applied to a large-scale high-resolution climate modeling data set.
Our model achieves a compression ratio of 140 on several benchmark data sets without compromising the reconstruction quality.
arXiv Detail & Related papers (2023-07-09T16:11:02Z) - Unrolled Compressed Blind-Deconvolution [77.88847247301682]
sparse multichannel blind deconvolution (S-MBD) arises frequently in many engineering applications such as radar/sonar/ultrasound imaging.
We propose a compression method that enables blind recovery from much fewer measurements with respect to the full received signal in time.
arXiv Detail & Related papers (2022-09-28T15:16:58Z) - COIN++: Data Agnostic Neural Compression [55.27113889737545]
COIN++ is a neural compression framework that seamlessly handles a wide range of data modalities.
We demonstrate the effectiveness of our method by compressing various data modalities.
arXiv Detail & Related papers (2022-01-30T20:12:04Z) - Exploring Autoencoder-based Error-bounded Compression for Scientific
Data [14.724393511470225]
We develop an error-bounded autoencoder-based framework in terms of the SZ model.
We optimize the compression quality for the main stages in our designed AE-based error-bounded compression framework.
arXiv Detail & Related papers (2021-05-25T07:53:32Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - Analyzing and Mitigating JPEG Compression Defects in Deep Learning [69.04777875711646]
We present a unified study of the effects of JPEG compression on a range of common tasks and datasets.
We show that there is a significant penalty on common performance metrics for high compression.
arXiv Detail & Related papers (2020-11-17T20:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.