Related papers: A framework for compressing unstructured scientific data via serialization

A framework for compressing unstructured scientific data via serialization

URL: http://arxiv.org/abs/2410.08059v1
Date: Thu, 10 Oct 2024 15:53:35 GMT
Title: A framework for compressing unstructured scientific data via serialization
Authors: Viktor Reshniak, Qian Gong, Rick Archibald, Scott Klasky, Norbert Podhorszki,
Abstract summary: We present a general framework for compressing unstructured scientific data with known local connectivity. A common application is simulation data defined on arbitrary finite element meshes. The framework employs a greedy topology preserving reordering of original nodes which allows for seamless integration into existing data processing pipelines.
Score: 2.5768995309704104
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a general framework for compressing unstructured scientific data with known local connectivity. A common application is simulation data defined on arbitrary finite element meshes. The framework employs a greedy topology preserving reordering of original nodes which allows for seamless integration into existing data processing pipelines. This reordering process depends solely on mesh connectivity and can be performed offline for optimal efficiency. However, the algorithm's greedy nature also supports on-the-fly implementation. The proposed method is compatible with any compression algorithm that leverages spatial correlations within the data. The effectiveness of this approach is demonstrated on a large-scale real dataset using several compression methods, including MGARD, SZ, and ZFP.

Related papers

An Enhancement of Jiang, Z., et al.s Compression-Based Classification Algorithm Applied to News Article Categorization [0.0]
This study enhances Jiang et al.'s compression-based classification algorithm by addressing its limitations in detecting semantic similarities between text documents. The proposed improvements focus on unigram extraction and optimized concatenation, eliminating reliance on entire document compression. Experimental results across datasets of varying sizes and complexities demonstrate an average accuracy improvement of 5.73%, with gains of up to 11% on datasets containing longer documents.
arXiv Detail & Related papers (2025-02-20T10:50:59Z)
Accelerated Methods with Compressed Communications for Distributed Optimization Problems under Data Similarity [55.03958223190181]
We propose the first theoretically grounded accelerated algorithms utilizing unbiased and biased compression under data similarity. Our results are of record and confirmed by experiments on different average losses and datasets.
arXiv Detail & Related papers (2024-12-21T00:40:58Z)
Ares: Approximate Representations via Efficient Sparsification -- A Stateless Approach through Polynomial Homomorphism [1.3824176915623292]
We introduce a stateless compression framework that leverages limiting representations to achieve compact, interpretable and scalable data reduction. Our approach achieves high compression ratios without compromising reconstruction accuracy, all while maintaining simplicity and scalability.
arXiv Detail & Related papers (2024-12-14T00:05:43Z)
Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection [3.3454373538792552]
We present a unified framework that applies decomposition and optimal rank selection, employing a composite compression loss within defined rank constraints. Our approach includes an automatic rank search in a continuous space, efficiently identifying optimal rank configurations without the use of training data. Using various benchmark datasets, we demonstrate the efficacy of our method through a comprehensive analysis.
arXiv Detail & Related papers (2024-09-05T14:15:54Z)
Sparse $L^1$-Autoencoders for Scientific Data Compression [0.0]
We introduce effective data compression methods by developing autoencoders using high dimensional latent spaces that are $L1$-regularized. We show how these information-rich latent spaces can be used to mitigate blurring and other artifacts to obtain highly effective data compression methods for scientific data.
arXiv Detail & Related papers (2024-05-23T07:48:00Z)
Scalable Hybrid Learning Techniques for Scientific Data Compression [6.803722400888276]
Scientists require compression techniques that accurately preserve derived quantities of interest (QoIs) This paper presents a physics-informed compression technique implemented as an end-to-end, scalable, GPU-based pipeline for data compression.
arXiv Detail & Related papers (2022-12-21T03:00:18Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
Dataset Condensation with Latent Space Knowledge Factorization and Sharing [73.31614936678571]
We introduce a novel approach for solving dataset condensation problem by exploiting the regularity in a given dataset. Instead of condensing the dataset directly in the original input space, we assume a generative process of the dataset with a set of learnable codes. We experimentally show that our method achieves new state-of-the-art records by significant margins on various benchmark datasets.
arXiv Detail & Related papers (2022-08-21T18:14:08Z)
Federated Offline Reinforcement Learning [55.326673977320574]
We propose a multi-site Markov decision process model that allows for both homogeneous and heterogeneous effects across sites. We design the first federated policy optimization algorithm for offline RL with sample complexity. We give a theoretical guarantee for the proposed algorithm, where the suboptimality for the learned policies is comparable to the rate as if data is not distributed.
arXiv Detail & Related papers (2022-06-11T18:03:26Z)
Quantization for Distributed Optimization [0.0]
We present a set of all-reduce gradient compatible compression schemes which significantly reduce the communication overhead while maintaining the performance of vanilla SGD. Our compression methods perform better than the in-built methods currently offered by the deep learning frameworks.
arXiv Detail & Related papers (2021-09-26T05:16:12Z)
Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices. Previous unstructured or structured weight pruning methods can hardly truly accelerate inference. We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
Deep Magnification-Flexible Upsampling over 3D Point Clouds [103.09504572409449]
We propose a novel end-to-end learning-based framework to generate dense point clouds. We first formulate the problem explicitly, which boils down to determining the weights and high-order approximation errors. Then, we design a lightweight neural network to adaptively learn unified and sorted weights as well as the high-order refinements.
arXiv Detail & Related papers (2020-11-25T14:00:18Z)
Federated Learning with Compression: Unified Analysis and Sharp Guarantees [39.092596142018195]
Communication cost is often a critical bottleneck to scale up distributed optimization algorithms to collaboratively learn a model from millions of devices. Two notable trends to deal with the communication overhead of federated compression and computation are unreliable compression and heterogeneous communication. We analyze their convergence in both homogeneous and heterogeneous data distribution settings.
arXiv Detail & Related papers (2020-07-02T14:44:07Z)
FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data. To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.