Related papers: Attention Based Machine Learning Methods for Data Reduction with Guaranteed Error Bounds

Attention Based Machine Learning Methods for Data Reduction with Guaranteed Error Bounds

URL: http://arxiv.org/abs/2409.05357v1
Date: Mon, 9 Sep 2024 06:35:24 GMT
Title: Attention Based Machine Learning Methods for Data Reduction with Guaranteed Error Bounds
Authors: Xiao Li, Jaemoon Lee, Anand Rangarajan, Sanjay Ranka,
Abstract summary: Scientific applications in fields such as high energy physics generate vast amounts of data at high velocities. To address this challenge, data compression or reduction techniques are crucial. We propose an attention-based compression method utilizing a blockwise compression setup.
Score: 11.494915987840876
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scientific applications in fields such as high energy physics, computational fluid dynamics, and climate science generate vast amounts of data at high velocities. This exponential growth in data production is surpassing the advancements in computing power, network capabilities, and storage capacities. To address this challenge, data compression or reduction techniques are crucial. These scientific datasets have underlying data structures that consist of structured and block structured multidimensional meshes where each grid point corresponds to a tensor. It is important that data reduction techniques leverage strong spatial and temporal correlations that are ubiquitous in these applications. Additionally, applications such as CFD, process tensors comprising hundred plus species and their attributes at each grid point. Reduction techniques should be able to leverage interrelationships between the elements in each tensor. In this paper, we propose an attention-based hierarchical compression method utilizing a block-wise compression setup. We introduce an attention-based hyper-block autoencoder to capture inter-block correlations, followed by a block-wise encoder to capture block-specific information. A PCA-based post-processing step is employed to guarantee error bounds for each data block. Our method effectively captures both spatiotemporal and inter-variable correlations within and between data blocks. Compared to the state-of-the-art SZ3, our method achieves up to 8 times higher compression ratio on the multi-variable S3D dataset. When evaluated on single-variable setups using the E3SM and XGC datasets, our method still achieves up to 3 times and 2 times higher compression ratio, respectively.

Related papers

Guaranteed Conditional Diffusion: 3D Block-based Models for Scientific Data Compression [10.848192105624848]
This paper proposes a new compression paradigm -- Guaranteed Diffusion with Conditional Correction (GCDTC) It consists of a conditional diffusion model, tensor correction, and error guarantee. Our framework outperforms standard convolutional autoencoders and yields competitive compression quality with an existing scientific data compression algorithm.
arXiv Detail & Related papers (2025-02-18T15:33:09Z)
Variable Rate Neural Compression for Sparse Detector Data [9.331686712558144]
We propose a novel approach for TPC data compression via key-point identification facilitated by sparse convolution. BCAE-VS achieves a $75%$ improvement in reconstruction accuracy with a $10%$ increase in compression ratio over the previous state-of-the-art model.
arXiv Detail & Related papers (2024-11-18T17:15:35Z)
Machine Learning Techniques for Data Reduction of CFD Applications [10.881548113461493]
We present an approach called guaranteed block autoencoder that leverages Correlations for reducing scientific results. It uses a multidimensional block of tensors (CFD) for both input and output.
arXiv Detail & Related papers (2024-04-28T04:01:09Z)
Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input. We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture. We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z)
Scalable Hybrid Learning Techniques for Scientific Data Compression [6.803722400888276]
Scientists require compression techniques that accurately preserve derived quantities of interest (QoIs) This paper presents a physics-informed compression technique implemented as an end-to-end, scalable, GPU-based pipeline for data compression.
arXiv Detail & Related papers (2022-12-21T03:00:18Z)
Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV) NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z)
DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression [20.311114684028375]
We propose DeepSketch, a new reference search technique for post-deduplication delta compression. DeepSketch uses a deep neural network to extract a data block's sketch, i.e., to create an approximate data signature of the block. Our evaluation shows that DeepSketch improves the data-reduction ratio by up to 33% (21% on average) over a state-of-the-art post-deduplication delta-compression technique.
arXiv Detail & Related papers (2022-02-17T16:00:22Z)
COIN++: Data Agnostic Neural Compression [55.27113889737545]
COIN++ is a neural compression framework that seamlessly handles a wide range of data modalities. We demonstrate the effectiveness of our method by compressing various data modalities.
arXiv Detail & Related papers (2022-01-30T20:12:04Z)
Efficient Data Compression for 3D Sparse TPC via Bicephalous Convolutional Autoencoder [8.759778406741276]
This work introduces a dual-head autoencoder to resolve sparsity and regression simultaneously, called textitBicephalous Convolutional AutoEncoder (BCAE) It shows advantages both in compression fidelity and ratio compared to traditional data compression methods, such as MGARD, SZ, and ZFP.
arXiv Detail & Related papers (2021-11-09T21:26:37Z)
Federated Doubly Stochastic Kernel Learning for Vertically Partitioned Data [93.76907759950608]
We propose a doubly kernel learning algorithm for vertically partitioned data. We show that FDSKL is significantly faster than state-of-the-art federated learning methods when dealing with kernels.
arXiv Detail & Related papers (2020-08-14T05:46:56Z)
PowerGossip: Practical Low-Rank Communication Compression in Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers. Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z)
Spatial Information Guided Convolution for Real-Time RGBD Semantic Segmentation [79.78416804260668]
We propose Spatial information guided Convolution (S-Conv), which allows efficient RGB feature and 3D spatial information integration. S-Conv is competent to infer the sampling offset of the convolution kernel guided by the 3D spatial information. We further embed S-Conv into a semantic segmentation network, called Spatial information Guided convolutional Network (SGNet)
arXiv Detail & Related papers (2020-04-09T13:38:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.