Related papers: Scalable Hybrid Learning Techniques for Scientific Data Compression

Scalable Hybrid Learning Techniques for Scientific Data Compression

URL: http://arxiv.org/abs/2212.10733v1
Date: Wed, 21 Dec 2022 03:00:18 GMT
Title: Scalable Hybrid Learning Techniques for Scientific Data Compression
Authors: Tania Banerjee, Jong Choi, Jaemoon Lee, Qian Gong, Jieyang Chen, Scott Klasky, Anand Rangarajan, Sanjay Ranka
Abstract summary: Scientists require compression techniques that accurately preserve derived quantities of interest (QoIs) This paper presents a physics-informed compression technique implemented as an end-to-end, scalable, GPU-based pipeline for data compression.
Score: 6.803722400888276
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data compression is becoming critical for storing scientific data because many scientific applications need to store large amounts of data and post process this data for scientific discovery. Unlike image and video compression algorithms that limit errors to primary data, scientists require compression techniques that accurately preserve derived quantities of interest (QoIs). This paper presents a physics-informed compression technique implemented as an end-to-end, scalable, GPU-based pipeline for data compression that addresses this requirement. Our hybrid compression technique combines machine learning techniques and standard compression methods. Specifically, we combine an autoencoder, an error-bounded lossy compressor to provide guarantees on raw data error, and a constraint satisfaction post-processing step to preserve the QoIs within a minimal error (generally less than floating point error). The effectiveness of the data compression pipeline is demonstrated by compressing nuclear fusion simulation data generated by a large-scale fusion code, XGC, which produces hundreds of terabytes of data in a single day. Our approach works within the ADIOS framework and results in compression by a factor of more than 150 while requiring only a few percent of the computational resources necessary for generating the data, making the overall approach highly effective for practical scenarios.

Related papers

Variable Rate Neural Compression for Sparse Detector Data [9.331686712558144]
We propose a novel approach for TPC data compression via key-point identification facilitated by sparse convolution. BCAE-VS achieves a $75%$ improvement in reconstruction accuracy with a $10%$ increase in compression ratio over the previous state-of-the-art model.
arXiv Detail & Related papers (2024-11-18T17:15:35Z)
Lightweight Correlation-Aware Table Compression [58.50312417249682]
$texttVirtual$ is a framework that integrates seamlessly with existing open formats. Experiments on data-gov datasets show that $texttVirtual$ reduces file sizes by up to 40% compared to Apache Parquet.
arXiv Detail & Related papers (2024-10-17T22:28:07Z)
Enhancing Lossy Compression Through Cross-Field Information for Scientific Applications [11.025583805165455]
Lossy compression is one of the most effective methods for reducing the size of scientific data containing multiple data fields. Previous approaches use local information from a single target field when predicting target data points, limiting their potential to achieve higher compression ratios. We propose a novel hybrid prediction model that utilizes CNN to extract cross-field information and combine it with existing local field information.
arXiv Detail & Related papers (2024-09-26T21:06:53Z)
Sparse $L^1$-Autoencoders for Scientific Data Compression [0.0]
We introduce effective data compression methods by developing autoencoders using high dimensional latent spaces that are $L1$-regularized. We show how these information-rich latent spaces can be used to mitigate blurring and other artifacts to obtain highly effective data compression methods for scientific data.
arXiv Detail & Related papers (2024-05-23T07:48:00Z)
Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture. We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z)
SRN-SZ: Deep Leaning-Based Scientific Error-bounded Lossy Compression with Super-resolution Neural Networks [13.706955134941385]
We propose SRN-SZ, a deep learning-based scientific error-bounded lossy compressor. SRN-SZ applies the most advanced super-resolution network HAT for its compression. In experiments, SRN-SZ achieves up to 75% compression ratio improvements under the same error bound.
arXiv Detail & Related papers (2023-09-07T22:15:32Z)
Unrolled Compressed Blind-Deconvolution [77.88847247301682]
sparse multichannel blind deconvolution (S-MBD) arises frequently in many engineering applications such as radar/sonar/ultrasound imaging. We propose a compression method that enables blind recovery from much fewer measurements with respect to the full received signal in time.
arXiv Detail & Related papers (2022-09-28T15:16:58Z)
COIN++: Data Agnostic Neural Compression [55.27113889737545]
COIN++ is a neural compression framework that seamlessly handles a wide range of data modalities. We demonstrate the effectiveness of our method by compressing various data modalities.
arXiv Detail & Related papers (2022-01-30T20:12:04Z)
Exploring Autoencoder-based Error-bounded Compression for Scientific Data [14.724393511470225]
We develop an error-bounded autoencoder-based framework in terms of the SZ model. We optimize the compression quality for the main stages in our designed AE-based error-bounded compression framework.
arXiv Detail & Related papers (2021-05-25T07:53:32Z)
Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models. We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z)
Analyzing and Mitigating JPEG Compression Defects in Deep Learning [69.04777875711646]
We present a unified study of the effects of JPEG compression on a range of common tasks and datasets. We show that there is a significant penalty on common performance metrics for high compression.
arXiv Detail & Related papers (2020-11-17T20:32:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.