NeurLZ: An Online Neural Learning-Based Method to Enhance Scientific Lossy Compression
- URL: http://arxiv.org/abs/2409.05785v4
- Date: Fri, 18 Apr 2025 04:00:31 GMT
- Title: NeurLZ: An Online Neural Learning-Based Method to Enhance Scientific Lossy Compression
- Authors: Wenqi Jia, Zhewen Hu, Youyuan Liu, Boyuan Zhang, Jinzhen Wang, Jinyang Liu, Wei Niu, Stavros Kalafatis, Junzhou Huang, Sian Jin, Daoce Wang, Jiannan Tian, Miao Yin,
- Abstract summary: NeurLZ is a neural method designed to enhance lossy compression by integrating online learning, cross-field learning, and robust error regulation.<n>During the first five learning epochs, NeurLZ achieves an 89% bit rate reduction, with further optimization yielding up to around 94% reduction at equivalent distortion.
- Score: 34.30562110131907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale scientific simulations generate massive datasets, posing challenges for storage and I/O. Traditional lossy compression struggles to advance more in balancing compression ratio, data quality, and adaptability to diverse scientific data features. While deep learning-based solutions have been explored, their common practice of relying on large models and offline training limits adaptability to dynamic data characteristics and computational efficiency. To address these challenges, we propose NeurLZ, a neural method designed to enhance lossy compression by integrating online learning, cross-field learning, and robust error regulation. Key innovations of NeurLZ include: (1) compression-time online neural learning with lightweight skipping DNN models, adapting to residual errors without costly offline pertaining, (2) the error-mitigating capability, recovering fine details from compression errors overlooked by conventional compressors, (3) $1\times$ and $2\times$ error-regulation modes, ensuring strict adherence to $1\times$ user-input error bounds strictly or relaxed 2$\times$ bounds for better overall quality, and (4) cross-field learning leveraging inter-field correlations in scientific data to improve conventional methods. Comprehensive evaluations on representative HPC datasets, e.g., Nyx, Miranda, Hurricane, against state-of-the-art compressors show NeurLZ's effectiveness. During the first five learning epochs, NeurLZ achieves an 89% bit rate reduction, with further optimization yielding up to around 94% reduction at equivalent distortion, significantly outperforming existing methods, demonstrating NeurLZ's superior performance in enhancing scientific lossy compression as a scalable and efficient solution.
Related papers
- DeepCQ: General-Purpose Deep-Surrogate Framework for Lossy Compression Quality Prediction [4.634179787231294]
We present a general-purpose deep-surrogate framework for lossy compression quality prediction (DeepCQ)<n>Our results highlight the framework's exceptional predictive accuracy, with prediction errors generally under 10% across most settings.
arXiv Detail & Related papers (2025-12-24T21:46:17Z) - An Efficient Gradient-Aware Error-Bounded Lossy Compressor for Federated Learning [7.649286962189554]
Federated learning (FL) enables collaborative model training without exposing clients' private data.<n>EBLC is particularly appealing for its fine-grained utility-compression tradeoff.<n>We propose an EBLC framework tailored for FL gradient data to achieve high compression ratios while preserving model accuracy.
arXiv Detail & Related papers (2025-11-07T23:59:09Z) - GANCompress: GAN-Enhanced Neural Image Compression with Binary Spherical Quantization [0.0]
GANCompress is a novel neural compression framework that combines Binary Spherical Quantization (BSQ) with Generative Adversarial Networks (GANs)<n>We show that GANCompress achieves substantial improvement in compression efficiency -- reducing file sizes by up to 100x with minimal visual distortion.
arXiv Detail & Related papers (2025-05-19T00:18:27Z) - A Brief Review for Compression and Transfer Learning Techniques in DeepFake Detection [13.783950035836593]
Training and deploying deepfake detection models on edge devices offers the advantage of maintaining data privacy and confidentiality by processing it close to its source.
We explore compression techniques to reduce computational demands and inference time, alongside transfer learning methods to minimize training overhead.
arXiv Detail & Related papers (2025-04-29T13:37:21Z) - Ultra-Resolution Adaptation with Ease [62.56434979517156]
We propose a set of key guidelines for ultra-resolution adaptation termed emphURAE.<n>We show that tuning minor components of the weight matrices outperforms widely-used low-rank adapters when synthetic data are unavailable.<n>Experiments validate that URAE achieves comparable 2K-generation performance to state-of-the-art closed-source models like FLUX1.1 [Pro] Ultra with only 3K samples and 2K iterations.
arXiv Detail & Related papers (2025-03-20T16:44:43Z) - Variable Rate Neural Compression for Sparse Detector Data [9.331686712558144]
We propose a novel approach for TPC data compression via key-point identification facilitated by sparse convolution.
BCAE-VS achieves a $75%$ improvement in reconstruction accuracy with a $10%$ increase in compression ratio over the previous state-of-the-art model.
arXiv Detail & Related papers (2024-11-18T17:15:35Z) - Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis [50.18156030818883]
Anomaly and missing data constitute a thorny problem in industrial applications.
Deep learning enabled anomaly detection has emerged as a critical direction.
The data collected in edge devices contain user privacy.
arXiv Detail & Related papers (2024-11-06T15:38:31Z) - EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation [84.70637613266835]
We re-formulate the model compression problem into the customized compensation problem.
We propose Training-free Eigenspace Low-Rank Approximation (EoRA)
EoRA directly minimizes compression-induced errors without requiring gradient-based training.
arXiv Detail & Related papers (2024-10-28T17:59:03Z) - GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data [14.92764869276237]
We propose GWLZ, a novel group-wise learning-based lossy compression framework with multiple lightweight learnable enhancer models.
We show that GWLZ significantly enhances the decompressed data reconstruction quality with negligible impact on the compression efficiency.
arXiv Detail & Related papers (2024-04-20T21:12:53Z) - Communication-Efficient Distributed Learning with Local Immediate Error
Compensation [95.6828475028581]
We propose the Local Immediate Error Compensated SGD (LIEC-SGD) optimization algorithm.
LIEC-SGD is superior to previous works in either the convergence rate or the communication cost.
arXiv Detail & Related papers (2024-02-19T05:59:09Z) - Leveraging Neural Radiance Fields for Uncertainty-Aware Visual
Localization [56.95046107046027]
We propose to leverage Neural Radiance Fields (NeRF) to generate training samples for scene coordinate regression.
Despite NeRF's efficiency in rendering, many of the rendered data are polluted by artifacts or only contain minimal information gain.
arXiv Detail & Related papers (2023-10-10T20:11:13Z) - SRN-SZ: Deep Leaning-Based Scientific Error-bounded Lossy Compression
with Super-resolution Neural Networks [13.706955134941385]
We propose SRN-SZ, a deep learning-based scientific error-bounded lossy compressor.
SRN-SZ applies the most advanced super-resolution network HAT for its compression.
In experiments, SRN-SZ achieves up to 75% compression ratio improvements under the same error bound.
arXiv Detail & Related papers (2023-09-07T22:15:32Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs)
It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z) - Minimizing the Accumulated Trajectory Error to Improve Dataset
Distillation [151.70234052015948]
We propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.
We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory.
Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7%.
arXiv Detail & Related papers (2022-11-20T15:49:11Z) - The Bearable Lightness of Big Data: Towards Massive Public Datasets in
Scientific Machine Learning [0.0]
We show that lossy compression algorithms offer a realistic pathway for exposing high-fidelity scientific data to open-source data repositories.
In this paper, we outline, construct, and evaluate the requirements for establishing a big data framework.
arXiv Detail & Related papers (2022-07-25T21:44:53Z) - Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV)
NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones.
We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z) - Optimal Rate Adaption in Federated Learning with Compressed
Communications [28.16239232265479]
Federated Learning incurs high communication overhead, which can be greatly alleviated by compression for model updates.
tradeoff between compression and model accuracy in the networked environment remains unclear.
We present a framework to maximize the final model accuracy by strategically adjusting the compression each iteration.
arXiv Detail & Related papers (2021-12-13T14:26:15Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Balanced Softmax Cross-Entropy for Incremental Learning [6.5423218639215275]
Deep neural networks are prone to catastrophic forgetting when incrementally trained on new classes or new tasks.
Recent methods has proven to be effective to mitigate catastrophic forgetting.
We propose the use of the Balanced Softmax Cross-Entropy loss and show that it can be combined with exiting methods for incremental learning to improve their performances.
arXiv Detail & Related papers (2021-03-23T13:30:26Z) - An Efficient Statistical-based Gradient Compression Technique for
Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC)
Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.