Untargeted Region of Interest Selection for GC-MS Data using a Pseudo
F-Ratio Moving Window ($\psi$FRMV)
- URL: http://arxiv.org/abs/2208.00313v1
- Date: Sat, 30 Jul 2022 21:43:05 GMT
- Title: Untargeted Region of Interest Selection for GC-MS Data using a Pseudo
F-Ratio Moving Window ($\psi$FRMV)
- Authors: Ryland T. Giebelhaus, Michael D. Sorochan Armstrong, A. Paulina de la
Mata, James J. Harynuk
- Abstract summary: We propose a new method for automated, untargeted region of interest selection in GC-MS data.
It is based on the ratio of the squared first, and second singular values from the Singular Value Decomposition of a window that moves across the chromatogram.
The sensitivity of the algorithm was tested by investigating the concentration at which it can no longer pick out chromatographic regions known to contain signal.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: There are many challenges associated with analysing gas chromatography - mass
spectrometry (GC-MS) data. Many of these challenges stem from the fact that
electron ionisation can make it difficult to recover molecular information due
to the high degree of fragmentation with concomitant loss of molecular ion
signal. With GC-MS data there are often many common fragment ions shared among
closely-eluting peaks, necessitating sophisticated methods for analysis. Some
of these methods are fully automated, but make some assumptions about the data
which can introduce artifacts during the analysis. Chemometric methods such as
Multivariate Curve Resolution, or Parallel Factor Analysis are particularly
attractive, since they are flexible and make relatively few assumptions about
the data - ideally resulting in fewer artifacts. These methods do require
expert user intervention to determine the most relevant regions of interest and
an appropriate number of components, $k$, for each region. Automated region of
interest selection is needed to permit automated batch processing of
chromatographic data with advanced signal deconvolution. Here, we propose a new
method for automated, untargeted region of interest selection that accounts for
the multivariate information present in GC-MS data to select regions of
interest based on the ratio of the squared first, and second singular values
from the Singular Value Decomposition of a window that moves across the
chromatogram. Assuming that the first singular value accounts largely for
signal, and that the second singular value accounts largely for noise, it is
possible to interpret the relationship between these two values as a
probabilistic distribution of Fisher Ratios. The sensitivity of the algorithm
was tested by investigating the concentration at which the algorithm can no
longer pick out chromatographic regions known to contain signal.
Related papers
- Multi-Source and Test-Time Domain Adaptation on Multivariate Signals using Spatio-Temporal Monge Alignment [59.75420353684495]
Machine learning applications on signals such as computer vision or biomedical data often face challenges due to the variability that exists across hardware devices or session recordings.
In this work, we propose Spatio-Temporal Monge Alignment (STMA) to mitigate these variabilities.
We show that STMA leads to significant and consistent performance gains between datasets acquired with very different settings.
arXiv Detail & Related papers (2024-07-19T13:33:38Z) - Data Augmentation Scheme for Raman Spectra with Highly Correlated
Annotations [0.23090185577016453]
We exploit the additive nature of spectra in order to generate additional data points from a given dataset that have statistically independent labels.
We show that training a CNN on these generated data points improves the performance on datasets where the annotations do not bear the same correlation as the dataset that was used for model training.
arXiv Detail & Related papers (2024-02-01T18:46:28Z) - Datacube segmentation via Deep Spectral Clustering [76.48544221010424]
Extended Vision techniques often pose a challenge in their interpretation.
The huge dimensionality of data cube spectra poses a complex task in its statistical interpretation.
In this paper, we explore the possibility of applying unsupervised clustering methods in encoded space.
A statistical dimensional reduction is performed by an ad hoc trained (Variational) AutoEncoder, while the clustering process is performed by a (learnable) iterative K-Means clustering algorithm.
arXiv Detail & Related papers (2024-01-31T09:31:28Z) - Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - On the Interplay of Subset Selection and Informed Graph Neural Networks [3.091456764812509]
This work focuses on predicting the molecules atomization energy in the QM9 dataset.
We show how maximizing molecular diversity in the training set selection process increases the robustness of linear and nonlinear regression techniques.
We also check the reliability of the predictions made by the graph neural network with a model-agnostic explainer.
arXiv Detail & Related papers (2023-06-15T09:09:27Z) - HFN: Heterogeneous Feature Network for Multivariate Time Series Anomaly
Detection [2.253268952202213]
We propose a novel semi-supervised anomaly detection framework based on a heterogeneous feature network (HFN) for MTS.
We first combine the embedding similarity subgraph generated by sensor embedding and feature value similarity subgraph generated by sensor values to construct a time-series heterogeneous graph.
This approach fuses the state-of-the-art technologies of heterogeneous graph structure learning (HGSL) and representation learning.
arXiv Detail & Related papers (2022-11-01T05:01:34Z) - Peak Detection On Data Independent Acquisition Mass Spectrometry Data
With Semisupervised Convolutional Transformers [0.0]
Liquid Chromatography coupled to Mass Spectrometry (LC-MS) based methods are commonly used for high- throughput, quantitative measurements of the proteome.
We formulate this peak detection problem as a multivariate time series segmentation problem, and propose a novel approach based on the Transformer architecture.
Here we augment Transformers, which are capable of capturing long distance dependencies with a global view, with Convolutional Neural Networks (CNNs)
We further train this model in a semisupervised manner by adapting state of the art semisupervised image classification techniques for multi-channel time series data.
arXiv Detail & Related papers (2020-10-26T18:55:27Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - Statistical control for spatio-temporal MEG/EEG source imaging with
desparsified multi-task Lasso [102.84915019938413]
Non-invasive techniques like magnetoencephalography (MEG) or electroencephalography (EEG) offer promise of non-invasive techniques.
The problem of source localization, or source imaging, poses however a high-dimensional statistical inference challenge.
We propose an ensemble of desparsified multi-task Lasso (ecd-MTLasso) to deal with this problem.
arXiv Detail & Related papers (2020-09-29T21:17:16Z) - PointIso: Point Cloud Based Deep Learning Model for Detecting
Arbitrary-Precision Peptide Features in LC-MS Map through Attention Based
Segmentation [5.495506445661776]
PointIso is a point cloud based, arbitrary-precision deep learning network to address the problem of peptide feature detection.
It achieves 98% detection of high quality MS/MS identifications in a benchmark dataset.
arXiv Detail & Related papers (2020-09-15T17:34:14Z) - Improved guarantees and a multiple-descent curve for Column Subset
Selection and the Nystr\"om method [76.73096213472897]
We develop techniques which exploit spectral properties of the data matrix to obtain improved approximation guarantees.
Our approach leads to significantly better bounds for datasets with known rates of singular value decay.
We show that both our improved bounds and the multiple-descent curve can be observed on real datasets simply by varying the RBF parameter.
arXiv Detail & Related papers (2020-02-21T00:43:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.