Sparse $L^1$-Autoencoders for Scientific Data Compression
- URL: http://arxiv.org/abs/2405.14270v1
- Date: Thu, 23 May 2024 07:48:00 GMT
- Title: Sparse $L^1$-Autoencoders for Scientific Data Compression
- Authors: Matthias Chung, Rick Archibald, Paul Atzberger, Jack Michael Solomon,
- Abstract summary: We introduce effective data compression methods by developing autoencoders using high dimensional latent spaces that are $L1$-regularized.
We show how these information-rich latent spaces can be used to mitigate blurring and other artifacts to obtain highly effective data compression methods for scientific data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Scientific datasets present unique challenges for machine learning-driven compression methods, including more stringent requirements on accuracy and mitigation of potential invalidating artifacts. Drawing on results from compressed sensing and rate-distortion theory, we introduce effective data compression methods by developing autoencoders using high dimensional latent spaces that are $L^1$-regularized to obtain sparse low dimensional representations. We show how these information-rich latent spaces can be used to mitigate blurring and other artifacts to obtain highly effective data compression methods for scientific data. We demonstrate our methods for short angle scattering (SAS) datasets showing they can achieve compression ratios around two orders of magnitude and in some cases better. Our compression methods show promise for use in addressing current bottlenecks in transmission, storage, and analysis in high-performance distributed computing environments. This is central to processing the large volume of SAS data being generated at shared experimental facilities around the world to support scientific investigations. Our approaches provide general ways for obtaining specialized compression methods for targeted scientific datasets.
Related papers
- ODDN: Addressing Unpaired Data Challenges in Open-World Deepfake Detection on Online Social Networks [51.03118447290247]
We propose the open-world deepfake detection network (ODDN), which comprises open-world data aggregation (ODA) and compression-discard gradient correction (CGC)
ODA effectively aggregates correlations between compressed and raw samples through both fine-grained and coarse-grained analyses.
CGC incorporates a compression-discard gradient correction to further enhance performance across diverse compression methods in online social networks (OSNs)
arXiv Detail & Related papers (2024-10-24T12:32:22Z) - Lightweight Correlation-Aware Table Compression [58.50312417249682]
$texttVirtual$ is a framework that integrates seamlessly with existing open formats.
Experiments on data-gov datasets show that $texttVirtual$ reduces file sizes by up to 40% compared to Apache Parquet.
arXiv Detail & Related papers (2024-10-17T22:28:07Z) - Enhancing Lossy Compression Through Cross-Field Information for Scientific Applications [11.025583805165455]
Lossy compression is one of the most effective methods for reducing the size of scientific data containing multiple data fields.
Previous approaches use local information from a single target field when predicting target data points, limiting their potential to achieve higher compression ratios.
We propose a novel hybrid prediction model that utilizes CNN to extract cross-field information and combine it with existing local field information.
arXiv Detail & Related papers (2024-09-26T21:06:53Z) - Compression of Structured Data with Autoencoders: Provable Benefit of
Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture.
We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z) - Neural-based Compression Scheme for Solar Image Data [8.374518151411612]
We propose a neural network-based lossy compression method to be used in NASA's data-intensive imagery missions.
In this work, we propose an adversarially trained neural network, equipped with local and non-local attention modules to capture both the local and global structure of the image.
As a proof of concept for use of this algorithm in SDO data analysis, we have performed coronal hole (CH) detection using our compressed images.
arXiv Detail & Related papers (2023-11-06T04:13:58Z) - Scalable Hybrid Learning Techniques for Scientific Data Compression [6.803722400888276]
Scientists require compression techniques that accurately preserve derived quantities of interest (QoIs)
This paper presents a physics-informed compression technique implemented as an end-to-end, scalable, GPU-based pipeline for data compression.
arXiv Detail & Related papers (2022-12-21T03:00:18Z) - Unrolled Compressed Blind-Deconvolution [77.88847247301682]
sparse multichannel blind deconvolution (S-MBD) arises frequently in many engineering applications such as radar/sonar/ultrasound imaging.
We propose a compression method that enables blind recovery from much fewer measurements with respect to the full received signal in time.
arXiv Detail & Related papers (2022-09-28T15:16:58Z) - COIN++: Data Agnostic Neural Compression [55.27113889737545]
COIN++ is a neural compression framework that seamlessly handles a wide range of data modalities.
We demonstrate the effectiveness of our method by compressing various data modalities.
arXiv Detail & Related papers (2022-01-30T20:12:04Z) - Efficient Data Compression for 3D Sparse TPC via Bicephalous
Convolutional Autoencoder [8.759778406741276]
This work introduces a dual-head autoencoder to resolve sparsity and regression simultaneously, called textitBicephalous Convolutional AutoEncoder (BCAE)
It shows advantages both in compression fidelity and ratio compared to traditional data compression methods, such as MGARD, SZ, and ZFP.
arXiv Detail & Related papers (2021-11-09T21:26:37Z) - Exploring Autoencoder-based Error-bounded Compression for Scientific
Data [14.724393511470225]
We develop an error-bounded autoencoder-based framework in terms of the SZ model.
We optimize the compression quality for the main stages in our designed AE-based error-bounded compression framework.
arXiv Detail & Related papers (2021-05-25T07:53:32Z) - Analyzing and Mitigating JPEG Compression Defects in Deep Learning [69.04777875711646]
We present a unified study of the effects of JPEG compression on a range of common tasks and datasets.
We show that there is a significant penalty on common performance metrics for high compression.
arXiv Detail & Related papers (2020-11-17T20:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.