AstroCompress: A benchmark dataset for multi-purpose compression of astronomical data
- URL: http://arxiv.org/abs/2506.08306v1
- Date: Tue, 10 Jun 2025 00:32:30 GMT
- Title: AstroCompress: A benchmark dataset for multi-purpose compression of astronomical data
- Authors: Tuan Truong, Rithwik Sudharsan, Yibo Yang, Peter Xiangyuan Ma, Ruihan Yang, Stephan Mandt, Joshua S. Bloom,
- Abstract summary: This paper introduces AstroCompress: a neural compression challenge for astrophysics data.<n>We provide code to easily access the data and benchmark seven lossless compression methods.<n>Our results indicate that lossless neural compression techniques can enhance data collection at observatories.
- Score: 31.271365337613606
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The site conditions that make astronomical observatories in space and on the ground so desirable -- cold and dark -- demand a physical remoteness that leads to limited data transmission capabilities. Such transmission limitations directly bottleneck the amount of data acquired and in an era of costly modern observatories, any improvements in lossless data compression has the potential scale to billions of dollars worth of additional science that can be accomplished on the same instrument. Traditional lossless methods for compressing astrophysical data are manually designed. Neural data compression, on the other hand, holds the promise of learning compression algorithms end-to-end from data and outperforming classical techniques by leveraging the unique spatial, temporal, and wavelength structures of astronomical images. This paper introduces AstroCompress: a neural compression challenge for astrophysics data, featuring four new datasets (and one legacy dataset) with 16-bit unsigned integer imaging data in various modes: space-based, ground-based, multi-wavelength, and time-series imaging. We provide code to easily access the data and benchmark seven lossless compression methods (three neural and four non-neural, including all practical state-of-the-art algorithms). Our results on lossless compression indicate that lossless neural compression techniques can enhance data collection at observatories, and provide guidance on the adoption of neural compression in scientific applications. Though the scope of this paper is restricted to lossless compression, we also comment on the potential exploration of lossy compression methods in future studies.
Related papers
- GraphComp: Extreme Error-bounded Compression of Scientific Data via Temporal Graph Autoencoders [7.129137910302658]
We propose GRAPHCOMP, a graph-based method for error-bounded lossy compression of scientific data.<n>Inspired by Graph Neural Networks (GNNs), we then propose a temporal graph autoencoder to learn latent representations that significantly reduce the size of the graph.<n>Decompression reverses the process and utilizes the learnt graph model together with the latent representation to reconstruct an approximation of the original data.
arXiv Detail & Related papers (2025-05-08T18:58:54Z) - Sparse $L^1$-Autoencoders for Scientific Data Compression [0.0]
We introduce effective data compression methods by developing autoencoders using high dimensional latent spaces that are $L1$-regularized.
We show how these information-rich latent spaces can be used to mitigate blurring and other artifacts to obtain highly effective data compression methods for scientific data.
arXiv Detail & Related papers (2024-05-23T07:48:00Z) - Convolutional variational autoencoders for secure lossy image compression in remote sensing [47.75904906342974]
This study investigates image compression based on convolutional variational autoencoders (CVAE)
CVAEs have been demonstrated to outperform conventional compression methods such as JPEG2000 by a substantial margin on compression benchmark datasets.
arXiv Detail & Related papers (2024-04-03T15:17:29Z) - Compression of Structured Data with Autoencoders: Provable Benefit of
Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture.
We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z) - Neural-based Compression Scheme for Solar Image Data [8.374518151411612]
We propose a neural network-based lossy compression method to be used in NASA's data-intensive imagery missions.
In this work, we propose an adversarially trained neural network, equipped with local and non-local attention modules to capture both the local and global structure of the image.
As a proof of concept for use of this algorithm in SDO data analysis, we have performed coronal hole (CH) detection using our compressed images.
arXiv Detail & Related papers (2023-11-06T04:13:58Z) - SRN-SZ: Deep Leaning-Based Scientific Error-bounded Lossy Compression
with Super-resolution Neural Networks [13.706955134941385]
We propose SRN-SZ, a deep learning-based scientific error-bounded lossy compressor.
SRN-SZ applies the most advanced super-resolution network HAT for its compression.
In experiments, SRN-SZ achieves up to 75% compression ratio improvements under the same error bound.
arXiv Detail & Related papers (2023-09-07T22:15:32Z) - Hierarchical Autoencoder-based Lossy Compression for Large-scale High-resolution Scientific Data [12.831138965071945]
This work presents a neural network that significantly compresses large-scale scientific data, but also maintains high reconstruction quality.
The proposed model is tested with scientific benchmark data available publicly and applied to a large-scale high-resolution climate modeling data set.
Our model achieves a compression ratio of 140 on several benchmark data sets without compromising the reconstruction quality.
arXiv Detail & Related papers (2023-07-09T16:11:02Z) - Scalable Hybrid Learning Techniques for Scientific Data Compression [6.803722400888276]
Scientists require compression techniques that accurately preserve derived quantities of interest (QoIs)
This paper presents a physics-informed compression technique implemented as an end-to-end, scalable, GPU-based pipeline for data compression.
arXiv Detail & Related papers (2022-12-21T03:00:18Z) - Unrolled Compressed Blind-Deconvolution [77.88847247301682]
sparse multichannel blind deconvolution (S-MBD) arises frequently in many engineering applications such as radar/sonar/ultrasound imaging.
We propose a compression method that enables blind recovery from much fewer measurements with respect to the full received signal in time.
arXiv Detail & Related papers (2022-09-28T15:16:58Z) - Deep Lossy Plus Residual Coding for Lossless and Near-lossless Image
Compression [85.93207826513192]
We propose a unified and powerful deep lossy plus residual (DLPR) coding framework for both lossless and near-lossless image compression.
We solve the joint lossy and residual compression problem in the approach of VAEs.
In the near-lossless mode, we quantize the original residuals to satisfy a given $ell_infty$ error bound.
arXiv Detail & Related papers (2022-09-11T12:11:56Z) - COIN++: Data Agnostic Neural Compression [55.27113889737545]
COIN++ is a neural compression framework that seamlessly handles a wide range of data modalities.
We demonstrate the effectiveness of our method by compressing various data modalities.
arXiv Detail & Related papers (2022-01-30T20:12:04Z) - Analyzing and Mitigating JPEG Compression Defects in Deep Learning [69.04777875711646]
We present a unified study of the effects of JPEG compression on a range of common tasks and datasets.
We show that there is a significant penalty on common performance metrics for high compression.
arXiv Detail & Related papers (2020-11-17T20:32:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.