A framework for compressing unstructured scientific data via serialization
- URL: http://arxiv.org/abs/2410.08059v1
- Date: Thu, 10 Oct 2024 15:53:35 GMT
- Title: A framework for compressing unstructured scientific data via serialization
- Authors: Viktor Reshniak, Qian Gong, Rick Archibald, Scott Klasky, Norbert Podhorszki,
- Abstract summary: We present a general framework for compressing unstructured scientific data with known local connectivity.
A common application is simulation data defined on arbitrary finite element meshes.
The framework employs a greedy topology preserving reordering of original nodes which allows for seamless integration into existing data processing pipelines.
- Score: 2.5768995309704104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a general framework for compressing unstructured scientific data with known local connectivity. A common application is simulation data defined on arbitrary finite element meshes. The framework employs a greedy topology preserving reordering of original nodes which allows for seamless integration into existing data processing pipelines. This reordering process depends solely on mesh connectivity and can be performed offline for optimal efficiency. However, the algorithm's greedy nature also supports on-the-fly implementation. The proposed method is compatible with any compression algorithm that leverages spatial correlations within the data. The effectiveness of this approach is demonstrated on a large-scale real dataset using several compression methods, including MGARD, SZ, and ZFP.
Related papers
- An Enhancement of Jiang, Z., et al.s Compression-Based Classification Algorithm Applied to News Article Categorization [0.0]
This study enhances Jiang et al.'s compression-based classification algorithm by addressing its limitations in detecting semantic similarities between text documents.
The proposed improvements focus on unigram extraction and optimized concatenation, eliminating reliance on entire document compression.
Experimental results across datasets of varying sizes and complexities demonstrate an average accuracy improvement of 5.73%, with gains of up to 11% on datasets containing longer documents.
arXiv Detail & Related papers (2025-02-20T10:50:59Z) - Accelerated Methods with Compressed Communications for Distributed Optimization Problems under Data Similarity [55.03958223190181]
We propose the first theoretically grounded accelerated algorithms utilizing unbiased and biased compression under data similarity.
Our results are of record and confirmed by experiments on different average losses and datasets.
arXiv Detail & Related papers (2024-12-21T00:40:58Z) - Ares: Approximate Representations via Efficient Sparsification -- A Stateless Approach through Polynomial Homomorphism [1.3824176915623292]
We introduce a stateless compression framework that leverages limiting representations to achieve compact, interpretable and scalable data reduction.
Our approach achieves high compression ratios without compromising reconstruction accuracy, all while maintaining simplicity and scalability.
arXiv Detail & Related papers (2024-12-14T00:05:43Z) - Sparse $L^1$-Autoencoders for Scientific Data Compression [0.0]
We introduce effective data compression methods by developing autoencoders using high dimensional latent spaces that are $L1$-regularized.
We show how these information-rich latent spaces can be used to mitigate blurring and other artifacts to obtain highly effective data compression methods for scientific data.
arXiv Detail & Related papers (2024-05-23T07:48:00Z) - Scalable Hybrid Learning Techniques for Scientific Data Compression [6.803722400888276]
Scientists require compression techniques that accurately preserve derived quantities of interest (QoIs)
This paper presents a physics-informed compression technique implemented as an end-to-end, scalable, GPU-based pipeline for data compression.
arXiv Detail & Related papers (2022-12-21T03:00:18Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Federated Offline Reinforcement Learning [55.326673977320574]
We propose a multi-site Markov decision process model that allows for both homogeneous and heterogeneous effects across sites.
We design the first federated policy optimization algorithm for offline RL with sample complexity.
We give a theoretical guarantee for the proposed algorithm, where the suboptimality for the learned policies is comparable to the rate as if data is not distributed.
arXiv Detail & Related papers (2022-06-11T18:03:26Z) - Quantization for Distributed Optimization [0.0]
We present a set of all-reduce gradient compatible compression schemes which significantly reduce the communication overhead while maintaining the performance of vanilla SGD.
Our compression methods perform better than the in-built methods currently offered by the deep learning frameworks.
arXiv Detail & Related papers (2021-09-26T05:16:12Z) - Efficient Micro-Structured Weight Unification and Pruning for Neural
Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices.
Previous unstructured or structured weight pruning methods can hardly truly accelerate inference.
We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z) - Deep Magnification-Flexible Upsampling over 3D Point Clouds [103.09504572409449]
We propose a novel end-to-end learning-based framework to generate dense point clouds.
We first formulate the problem explicitly, which boils down to determining the weights and high-order approximation errors.
Then, we design a lightweight neural network to adaptively learn unified and sorted weights as well as the high-order refinements.
arXiv Detail & Related papers (2020-11-25T14:00:18Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.