Lossy Compression of Large-Scale Radio Interferometric Data
- URL: http://arxiv.org/abs/2304.07050v1
- Date: Fri, 14 Apr 2023 10:50:24 GMT
- Title: Lossy Compression of Large-Scale Radio Interferometric Data
- Authors: M Atemkeng, S Perkins, E Seck, S Makhathini, O Smirnov, L Bester, B
Hugo
- Abstract summary: This work proposes to reduce visibility data volume using a baseline-dependent lossy compression technique.
MeerKAT and the European Very Long Baseline Interferometry Network are used as reference telescopes to evaluate and compare the performance of the proposed methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work proposes to reduce visibility data volume using a
baseline-dependent lossy compression technique that preserves smearing at the
edges of the field-of-view. We exploit the relation of the rank of a matrix and
the fact that a low-rank approximation can describe the raw visibility data as
a sum of basic components where each basic component corresponds to a specific
Fourier component of the sky distribution. As such, the entire visibility data
is represented as a collection of data matrices from baselines, instead of a
single tensor. The proposed methods are formulated as follows: provided a large
dataset of the entire visibility data; the first algorithm, named $simple~SVD$
projects the data into a regular sampling space of rank$-r$ data matrices. In
this space, the data for all the baselines has the same rank, which makes the
compression factor equal across all baselines. The second algorithm, named
$BDSVD$ projects the data into an irregular sampling space of rank$-r_{pq}$
data matrices. The subscript $pq$ indicates that the rank of the data matrix
varies across baselines $pq$, which makes the compression factor
baseline-dependent. MeerKAT and the European Very Long Baseline Interferometry
Network are used as reference telescopes to evaluate and compare the
performance of the proposed methods against traditional methods, such as
traditional averaging and baseline-dependent averaging (BDA). For the same
spatial resolution threshold, both $simple~SVD$ and $BDSVD$ show effective
compression by two-orders of magnitude higher than traditional averaging and
BDA. At the same space-saving rate, there is no decrease in spatial resolution
and there is a reduction in the noise variance in the data which improves the
S/N to over $1.5$ dB at the edges of the field-of-view.
Related papers
- Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training [53.07879717463279]
textscDomain2Vec decomposes any dataset into a linear combination of several emphmeta-domains<n>textscDomain2Vec helps find the data mixture that enhances downstream task performance with minimal computational overhead.
arXiv Detail & Related papers (2025-06-12T17:53:51Z) - FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA [61.79405341803085]
Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in federated learning (FL)<n>Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in federated learning (FL)
arXiv Detail & Related papers (2025-05-19T07:32:56Z) - Turnstile $\ell_p$ leverage score sampling with applications [56.403488578703865]
We develop a novel algorithm for sampling rows $a_i$ of a matrix $AinmathbbRntimes d$, proportional to their $ell_p$ norm, when $A$ is presented in a turnstile data stream.
Our algorithm not only returns the set of sampled row indexes, it also returns slightly perturbed rows $tildea_i approx a_i$, and approximates their sampling probabilities up to $varepsilon$ relative error.
For logistic regression, our framework yields the first algorithm that achieves a $
arXiv Detail & Related papers (2024-06-01T07:33:41Z) - Joint Projection Learning and Tensor Decomposition Based Incomplete
Multi-view Clustering [21.925066554821168]
We propose a novel Joint Projection and Decomposition Based method (JPLTD) for incomplete multi-view clustering.
JPLTD alleviates the influence of redundant features and noise in high-dimensional data.
Experiments on several benchmark datasets demonstrate that JPLTD outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-10-06T06:19:16Z) - Rethinking k-means from manifold learning perspective [122.38667613245151]
We present a new clustering algorithm which directly detects clusters of data without mean estimation.
Specifically, we construct distance matrix between data points by Butterworth filter.
To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization.
arXiv Detail & Related papers (2023-05-12T03:01:41Z) - Dataset Distillation via Factorization [58.8114016318593]
We introduce a emphdataset factorization approach, termed emphHaBa, which is a plug-and-play strategy portable to any existing dataset distillation (DD) baseline.
emphHaBa explores decomposing a dataset into two components: data emphHallucination networks and emphBases.
Our method can yield significant improvement on downstream classification tasks compared with previous state of the arts, while reducing the total number of compressed parameters by up to 65%.
arXiv Detail & Related papers (2022-10-30T08:36:19Z) - Covariance matrix preparation for quantum principal component analysis [0.8258451067861933]
Principal component analysis (PCA) is a dimensionality reduction method in data analysis.
Quantum algorithms have been formulated for PCA based on diagonalizing a density matrix.
We numerically implement our method for molecular ground-state datasets.
arXiv Detail & Related papers (2022-04-07T15:11:42Z) - Hybrid Model-based / Data-driven Graph Transform for Image Coding [54.31406300524195]
We present a hybrid model-based / data-driven approach to encode an intra-prediction residual block.
The first $K$ eigenvectors of a transform matrix are derived from a statistical model, e.g., the asymmetric discrete sine transform (ADST) for stability.
Using WebP as a baseline image, experimental results show that our hybrid graph transform achieved better energy compaction than default discrete cosine transform (DCT) and better stability than KLT.
arXiv Detail & Related papers (2022-03-02T15:36:44Z) - Leverage Score Sampling for Tensor Product Matrices in Input Sparsity
Time [54.65688986250061]
We give an input sparsity time sampling algorithm for approximating the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices.
Our sampling technique relies on a collection of $q$ partially correlated random projections which can be simultaneously applied to a dataset $X$ in total time.
arXiv Detail & Related papers (2022-02-09T15:26:03Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Learning a Deep Part-based Representation by Preserving Data
Distribution [21.13421736154956]
Unsupervised dimensionality reduction is one of the commonly used techniques in the field of high dimensional data recognition problems.
In this paper, by preserving the data distribution, a deep part-based representation can be learned, and the novel algorithm is called Distribution Preserving Network Embedding.
The experimental results on the real-world data sets show that the proposed algorithm has good performance in terms of cluster accuracy and AMI.
arXiv Detail & Related papers (2020-09-17T12:49:36Z) - A New Basis for Sparse Principal Component Analysis [5.258928416164104]
Previous versions of sparse principal component analysis presumed that the eigen-basis (a $p times k$ matrix) is approximately sparse.
We propose a method that presumes the $p times k$ matrix becomes approximately sparse after a $k times k$ rotation.
We show that for the same level of sparsity, the proposed sparse PCA method is more stable and can explain more variance compared to alternative methods.
arXiv Detail & Related papers (2020-07-01T16:32:22Z) - Information-Theoretic Limits for the Matrix Tensor Product [8.206394018475708]
This paper studies a high-dimensional inference problem involving the matrix tensor product of random matrices.
On the technical side, this paper introduces some new techniques for the analysis of high-dimensional matrix-preserving signals.
arXiv Detail & Related papers (2020-05-22T17:03:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.