Low dimensional representation of multi-patient flow cytometry datasets using optimal transport for minimal residual disease detection in leukemia
- URL: http://arxiv.org/abs/2407.17329v1
- Date: Wed, 24 Jul 2024 14:53:01 GMT
- Title: Low dimensional representation of multi-patient flow cytometry datasets using optimal transport for minimal residual disease detection in leukemia
- Authors: Erell Gachon, Jérémie Bigot, Elsa Cazelles, Aguirre Mimoun, Jean-Philippe Vial,
- Abstract summary: Representing and Minimal Residual Disease (MRD) in Acute Myeloid Leukemia (AML) is essential in prognosis and follow-up of patients.
In this paper, we explore statistical learning methods based on optimal transport (OT) to achieve a relevant low-dimensional representation of flow measurements.
In particular, our OT-based approach allows a relevant and informative two-dimensional representation of the results of the FlowSom algorithm.
- Score: 1.3980986259786223
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Representing and quantifying Minimal Residual Disease (MRD) in Acute Myeloid Leukemia (AML), a type of cancer that affects the blood and bone marrow, is essential in the prognosis and follow-up of AML patients. As traditional cytological analysis cannot detect leukemia cells below 5\%, the analysis of flow cytometry dataset is expected to provide more reliable results. In this paper, we explore statistical learning methods based on optimal transport (OT) to achieve a relevant low-dimensional representation of multi-patient flow cytometry measurements (FCM) datasets considered as high-dimensional probability distributions. Using the framework of OT, we justify the use of the K-means algorithm for dimensionality reduction of multiple large-scale point clouds through mean measure quantization by merging all the data into a single point cloud. After this quantization step, the visualization of the intra and inter-patients FCM variability is carried out by embedding low-dimensional quantized probability measures into a linear space using either Wasserstein Principal Component Analysis (PCA) through linearized OT or log-ratio PCA of compositional data. Using a publicly available FCM dataset and a FCM dataset from Bordeaux University Hospital, we demonstrate the benefits of our approach over the popular kernel mean embedding technique for statistical learning from multiple high-dimensional probability distributions. We also highlight the usefulness of our methodology for low-dimensional projection and clustering patient measurements according to their level of MRD in AML from FCM. In particular, our OT-based approach allows a relevant and informative two-dimensional representation of the results of the FlowSom algorithm, a state-of-the-art method for the detection of MRD in AML using multi-patient FCM.
Related papers
- Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - XAI for In-hospital Mortality Prediction via Multimodal ICU Data [57.73357047856416]
We propose an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data.
We employ multimodal learning in our framework, which can receive heterogeneous inputs from clinical data and make decisions.
Our framework can be easily transferred to other clinical tasks, which facilitates the discovery of crucial factors in healthcare research.
arXiv Detail & Related papers (2023-12-29T14:28:04Z) - Cancer Subtype Identification through Integrating Inter and Intra
Dataset Relationships in Multi-Omics Data [0.0]
The integration of multi-omics data has emerged as a promising approach for gaining insights into complex diseases such as cancer.
This paper proposes a novel approach to identify cancer subtypes through the integration of multi-omics data for clustering.
arXiv Detail & Related papers (2023-12-02T12:11:47Z) - Sparse high-dimensional linear mixed modeling with a partitioned empirical Bayes ECM algorithm [41.25603565852633]
This work presents an efficient and accurate Bayesian framework for high-dimensional LMMs.
The novelty of the approach lies in its partitioning and parameter expansion as well as its fast and scalable computation.
A real-world example is provided using data from a study of lupus in children, where we identify genes and clinical factors associated with a new lupus biomarker and predict the biomarker over time.
arXiv Detail & Related papers (2023-10-18T19:34:56Z) - Dimensionality Reduction of Longitudinal 'Omics Data using Modern Tensor
Factorization [4.740719424255845]
TCAM is a dimensionality reduction technique for multi-way data.
We show that TCAM outperforms traditional methods, as well as state-of-the-art tensor-based approaches for longitudinal microbiome data analysis.
arXiv Detail & Related papers (2021-11-28T14:50:14Z) - Lung Cancer Lesion Detection in Histopathology Images Using Graph-Based
Sparse PCA Network [93.22587316229954]
We propose a graph-based sparse principal component analysis (GS-PCA) network, for automated detection of cancerous lesions on histological lung slides stained by hematoxylin and eosin (H&E)
We evaluate the performance of the proposed algorithm on H&E slides obtained from an SVM K-rasG12D lung cancer mouse model using precision/recall rates, F-score, Tanimoto coefficient, and area under the curve (AUC) of the receiver operator characteristic (ROC)
arXiv Detail & Related papers (2021-10-27T19:28:36Z) - OCTAVA: an open-source toolbox for quantitative analysis of optical
coherence tomography angiography images [0.2621533844622817]
We report a user-friendly, open-source toolbox, OCTAVA, to automate the pre-processing, segmentation, and quantitative analysis of en face OCTA maximum intensity projection images.
We perform quantitative analysis of OCTA images from different commercial and non-commercial instruments and samples and show OCTAVA can accurately and reproducibly determine metrics for characterization of microvasculature.
arXiv Detail & Related papers (2021-09-04T10:11:42Z) - Statistical control for spatio-temporal MEG/EEG source imaging with
desparsified multi-task Lasso [102.84915019938413]
Non-invasive techniques like magnetoencephalography (MEG) or electroencephalography (EEG) offer promise of non-invasive techniques.
The problem of source localization, or source imaging, poses however a high-dimensional statistical inference challenge.
We propose an ensemble of desparsified multi-task Lasso (ecd-MTLasso) to deal with this problem.
arXiv Detail & Related papers (2020-09-29T21:17:16Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.