Frequency-domain alignment of heterogeneous, multidimensional separations data through complex orthogonal Procrustes analysis
- URL: http://arxiv.org/abs/2502.12810v1
- Date: Tue, 18 Feb 2025 12:14:14 GMT
- Title: Frequency-domain alignment of heterogeneous, multidimensional separations data through complex orthogonal Procrustes analysis
- Authors: Michael Sorochan Armstrong,
- Abstract summary: Multidimensional separations data have the capacity to reveal detailed information about complex biological samples.
Data analysis has been an ongoing challenge in the area since the peaks that represent chemical factors may drift over the course of several analytical runs.
This work offers a very simple solution to the alignment problem through a Procrustes analysis of the frequency-domain representation of synthetic multidimensional separations data.
- Score: 0.0
- License:
- Abstract: Multidimensional separations data have the capacity to reveal detailed information about complex biological samples. However, data analysis has been an ongoing challenge in the area since the peaks that represent chemical factors may drift over the course of several analytical runs along the first and second dimension retention times. This makes higher-level analyses of the data difficult, since a 1-1 comparison of samples is seldom possible without sophisticated pre-processing routines. Further complicating the issue is the fact that closely co-eluting components will need to be resolved, typically using some variants of Parallel Factor Analysis (PARAFAC), Multivariate Curve Resolution (MCR), or the recently explored Shift-Invariant Multi-linearity. These algorithms work with a user-specified number of components, and regions of interest that are then summarized as a peak table that is invariant to shift. However, identifying regions of interest across truly heterogeneous data remains an ongoing issue, for automated deployment of these algorithms. This work offers a very simple solution to the alignment problem through a orthogonal Procrustes analysis of the frequency-domain representation of synthetic multidimensional separations data, for peaks that are logarithmically transformed to simulate shift while preserving the underlying topology of the data. Using this very simple method for analysis, two synthetic chromatograms can be compared under close to the worst possible scenarios for alignment.
Related papers
- Efficient Sparsification of Simplicial Complexes via Local Densities of States [8.830922974884531]
Simplicial complexes (SCs) are generalizations of graph models for computation data that account for higher-order relations between data items.
The analysis of many real-world datasets leads to dense SCs with a large number of higher-order interactions.
We develop a novel method for a probabilistic sparsifaction of SCs.
arXiv Detail & Related papers (2025-02-11T13:51:42Z) - Localized Sparse Principal Component Analysis of Multivariate Time Series in Frequency Domain [0.0]
We introduce a formulation and consistent estimation procedure for interpretable principal component analysis for high-dimensional time series in the frequency domain.
An efficient frequency-sequential algorithm is developed to compute sparse-localized estimates of the low-dimensional principal subspaces of the signal process.
arXiv Detail & Related papers (2024-08-15T14:30:34Z) - Gradient-Based Feature Learning under Structured Data [57.76552698981579]
In the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction.
We show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue.
In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent.
arXiv Detail & Related papers (2023-09-07T16:55:50Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Untargeted Region of Interest Selection for GC-MS Data using a Pseudo
F-Ratio Moving Window ($\psi$FRMV) [0.0]
We propose a new method for automated, untargeted region of interest selection in GC-MS data.
It is based on the ratio of the squared first, and second singular values from the Singular Value Decomposition of a window that moves across the chromatogram.
The sensitivity of the algorithm was tested by investigating the concentration at which it can no longer pick out chromatographic regions known to contain signal.
arXiv Detail & Related papers (2022-07-30T21:43:05Z) - Analytical Modelling of Exoplanet Transit Specroscopy with Dimensional
Analysis and Symbolic Regression [68.8204255655161]
The deep learning revolution has opened the door for deriving such analytical results directly with a computer algorithm fitting to the data.
We successfully demonstrate the use of symbolic regression on synthetic data for the transit radii of generic hot Jupiter exoplanets.
As a preprocessing step, we use dimensional analysis to identify the relevant dimensionless combinations of variables.
arXiv Detail & Related papers (2021-12-22T00:52:56Z) - Revisiting the Sample Complexity of Sparse Spectrum Approximation of
Gaussian Processes [60.479499225746295]
We introduce a new scalable approximation for Gaussian processes with provable guarantees which hold simultaneously over its entire parameter space.
Our approximation is obtained from an improved sample complexity analysis for sparse spectrum Gaussian processes (SSGPs)
arXiv Detail & Related papers (2020-11-17T05:41:50Z) - Peak Detection On Data Independent Acquisition Mass Spectrometry Data
With Semisupervised Convolutional Transformers [0.0]
Liquid Chromatography coupled to Mass Spectrometry (LC-MS) based methods are commonly used for high- throughput, quantitative measurements of the proteome.
We formulate this peak detection problem as a multivariate time series segmentation problem, and propose a novel approach based on the Transformer architecture.
Here we augment Transformers, which are capable of capturing long distance dependencies with a global view, with Convolutional Neural Networks (CNNs)
We further train this model in a semisupervised manner by adapting state of the art semisupervised image classification techniques for multi-channel time series data.
arXiv Detail & Related papers (2020-10-26T18:55:27Z) - Sparse Generalized Canonical Correlation Analysis: Distributed
Alternating Iteration based Approach [18.93565942407577]
Sparse canonical correlation analysis (CCA) is a useful statistical tool to detect latent information with sparse structures.
We propose a generalized canonical correlation analysis (GCCA), which could detect the latent relations of multiview data with sparse structures.
arXiv Detail & Related papers (2020-04-23T05:53:48Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Inverse Learning of Symmetries [71.62109774068064]
We learn the symmetry transformation with a model consisting of two latent subspaces.
Our approach is based on the deep information bottleneck in combination with a continuous mutual information regulariser.
Our model outperforms state-of-the-art methods on artificial and molecular datasets.
arXiv Detail & Related papers (2020-02-07T13:48:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.