Related papers: Compressing high-resolution data through latent representation encoding for downscaling large-scale AI weather forecast model

Compressing high-resolution data through latent representation encoding for downscaling large-scale AI weather forecast model

URL: http://arxiv.org/abs/2410.09109v1
Date: Thu, 10 Oct 2024 05:38:03 GMT
Title: Compressing high-resolution data through latent representation encoding for downscaling large-scale AI weather forecast model
Authors: Qian Liu, Bing Gong, Xiaoran Zhuang, Xiaohui Zhong, Zhiming Kang, Hao Li,
Abstract summary: We propose a variational autoencoder framework tailored for compressing high-resolution datasets. Our framework successfully reduced the storage size of 3 years of HRCLDAS data from 8.61 TB to just 204 GB, while preserving essential information.
Score: 10.634513279883913
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The rapid advancement of artificial intelligence (AI) in weather research has been driven by the ability to learn from large, high-dimensional datasets. However, this progress also poses significant challenges, particularly regarding the substantial costs associated with processing extensive data and the limitations of computational resources. Inspired by the Neural Image Compression (NIC) task in computer vision, this study seeks to compress weather data to address these challenges and enhance the efficiency of downstream applications. Specifically, we propose a variational autoencoder (VAE) framework tailored for compressing high-resolution datasets, specifically the High Resolution China Meteorological Administration Land Data Assimilation System (HRCLDAS) with a spatial resolution of 1 km. Our framework successfully reduced the storage size of 3 years of HRCLDAS data from 8.61 TB to just 204 GB, while preserving essential information. In addition, we demonstrated the utility of the compressed data through a downscaling task, where the model trained on the compressed dataset achieved accuracy comparable to that of the model trained on the original data. These results highlight the effectiveness and potential of the compressed data for future weather research.

Related papers

Transforming Weather Data from Pixel to Latent Space [57.80389860291812]
We propose a novel Weather Latent Autoencoder that transforms weather data from pixel space to latent space. We demonstrate its superior compression and reconstruction performance, enabling the creation of the ERA5-latent dataset.
arXiv Detail & Related papers (2025-03-09T13:55:33Z)
Diffusion-Augmented Coreset Expansion for Scalable Dataset Distillation [18.474302012851087]
We propose a two-stage solution for dataset distillation. First, we compress the dataset by selecting only the most informative patches to form a coreset. Next, we leverage a generative foundation model to dynamically expand this compressed set in real-time. We demonstrate a significant improvement of over 10% compared to the state-of-the-art on several large-scale dataset distillation benchmarks.
arXiv Detail & Related papers (2024-12-05T23:40:27Z)
Efficient Compression of Sparse Accelerator Data Using Implicit Neural Representations and Importance Sampling [7.838980097597047]
Large-scale particle colliders in nuclear and high-energy physics generate data at extraordinary rates. We propose a novel approach using implicit neural representations for data learning and compression. We also introduce an importance sampling technique to accelerate the network training process.
arXiv Detail & Related papers (2024-12-02T17:50:49Z)
Variable Rate Neural Compression for Sparse Detector Data [9.331686712558144]
We propose a novel approach for TPC data compression via key-point identification facilitated by sparse convolution. BCAE-VS achieves a $75%$ improvement in reconstruction accuracy with a $10%$ increase in compression ratio over the previous state-of-the-art model.
arXiv Detail & Related papers (2024-11-18T17:15:35Z)
Enabling High Data Throughput Reinforcement Learning on GPUs: A Domain Agnostic Framework for Data-Driven Scientific Research [90.91438597133211]
We introduce WarpSci, a framework designed to overcome crucial system bottlenecks in the application of reinforcement learning. We eliminate the need for data transfer between the CPU and GPU, enabling the concurrent execution of thousands of simulations.
arXiv Detail & Related papers (2024-08-01T21:38:09Z)
Quanv4EO: Empowering Earth Observation by means of Quanvolutional Neural Networks [62.12107686529827]
This article highlights a significant shift towards leveraging quantum computing techniques in processing large volumes of remote sensing data. The proposed Quanv4EO model introduces a quanvolution method for preprocessing multi-dimensional EO data. Key findings suggest that the proposed model not only maintains high precision in image classification but also shows improvements of around 5% in EO use cases.
arXiv Detail & Related papers (2024-07-24T09:11:34Z)
Dynamic Data Pruning for Automatic Speech Recognition [58.95758272440217]
We introduce Dynamic Data Pruning for ASR (DDP-ASR), which offers fine-grained pruning granularities specifically tailored for speech-related datasets. Our experiments show that DDP-ASR can save up to 1.6x training time with negligible performance loss.
arXiv Detail & Related papers (2024-06-26T14:17:36Z)
CRA5: Extreme Compression of ERA5 for Portable Global Climate and Weather Research via an Efficient Variational Transformer [22.68937280154092]
We introduce an efficient neural, the Variational Autoencoder Transformer (VAEformer), for extreme compression of climate data. VAEformer outperforms existing state-of-the-art compression methods in the context of climate data. Experiments show that global weather forecasting models trained on the compact CRA5 dataset achieve forecasting accuracy comparable to the model trained on the original dataset.
arXiv Detail & Related papers (2024-05-06T11:30:55Z)
Computationally and Memory-Efficient Robust Predictive Analytics Using Big Data [0.0]
This study navigates through the challenges of data uncertainties, storage limitations, and predictive data-driven modeling using big data. We utilize Robust Principal Component Analysis (RPCA) for effective noise reduction and outlier elimination, and Optimal Sensor Placement (OSP) for efficient data compression and storage.
arXiv Detail & Related papers (2024-03-27T22:39:08Z)
Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets. DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z)
A Comprehensive Survey of Dataset Distillation [73.15482472726555]
It has become challenging to handle the unlimited growth of data with limited computing power. Deep learning technology has developed unprecedentedly in the last decade. This paper provides a holistic understanding of dataset distillation from multiple aspects.
arXiv Detail & Related papers (2023-01-13T15:11:38Z)
Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory. We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory. Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z)
The Bearable Lightness of Big Data: Towards Massive Public Datasets in Scientific Machine Learning [0.0]
We show that lossy compression algorithms offer a realistic pathway for exposing high-fidelity scientific data to open-source data repositories. In this paper, we outline, construct, and evaluate the requirements for establishing a big data framework.
arXiv Detail & Related papers (2022-07-25T21:44:53Z)
A Quick Review on Recent Trends in 3D Point Cloud Data Compression Techniques and the Challenges of Direct Processing in 3D Compressed Domain [3.169089186688223]
Automatic processing of 3D Point Cloud data for object detection, tracking and segmentation is the latest trending research in the field of AI and Data Science. The amount of data that is being produced in the form of 3D point cloud (with LiDAR) is very huge. The researchers are now on the way inventing new data compression algorithms to handle huge volumes of data thus generated.
arXiv Detail & Related papers (2020-07-08T12:56:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.