Related papers: Application of data engineering approaches to address challenges in microbiome data for optimal medical decision-making

Application of data engineering approaches to address challenges in microbiome data for optimal medical decision-making

URL: http://arxiv.org/abs/2307.00033v2
Date: Tue, 11 Jul 2023 11:01:42 GMT
Title: Application of data engineering approaches to address challenges in microbiome data for optimal medical decision-making
Authors: Isha Thombre, Pavan Kumar Perepu, Shyam Kumar Sudhakar
Abstract summary: The study addresses the issues inherent to microbiome datasets and could be highly beneficial for providing personalized medicine. The prototype employed in the study addresses the issues inherent to microbiome datasets and could be highly beneficial for providing personalized medicine.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The human gut microbiota is known to contribute to numerous physiological functions of the body and also implicated in a myriad of pathological conditions. Prolific research work in the past few decades have yielded valuable information regarding the relative taxonomic distribution of gut microbiota. Unfortunately, the microbiome data suffers from class imbalance and high dimensionality issues that must be addressed. In this study, we have implemented data engineering algorithms to address the above-mentioned issues inherent to microbiome data. Four standard machine learning classifiers (logistic regression (LR), support vector machines (SVM), random forests (RF), and extreme gradient boosting (XGB) decision trees) were implemented on a previously published dataset. The issue of class imbalance and high dimensionality of the data was addressed through synthetic minority oversampling technique (SMOTE) and principal component analysis (PCA). Our results indicate that ensemble classifiers (RF and XGB decision trees) exhibit superior classification accuracy in predicting the host phenotype. The application of PCA significantly reduced testing time while maintaining high classification accuracy. The highest classification accuracy was obtained at the levels of species for most classifiers. The prototype employed in the study addresses the issues inherent to microbiome datasets and could be highly beneficial for providing personalized medicine.

Related papers

Artificial Data Point Generation in Clustered Latent Space for Small Medical Datasets [4.542616945567623]
This paper introduces a novel method, Artificial Data Point Generation in Clustered Latent Space (AGCL) AGCL is designed to enhance classification performance on small medical datasets through synthetic data generation. It was applied to Parkinson's disease screening, utilizing facial expression data.
arXiv Detail & Related papers (2024-09-26T09:51:08Z)
ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering [54.80411755871931]
Question Answering (QA) effectively evaluates language models' reasoning and knowledge depth. Chemical QA plays a crucial role in both education and research by effectively translating complex chemical information into readily understandable format. This dataset reflects typical real-world challenges, including an imbalanced data distribution and a substantial amount of unlabeled data that can be potentially useful. We introduce a QAMatch model, specifically designed to effectively answer chemical questions by fully leveraging our collected data.
arXiv Detail & Related papers (2024-07-24T01:46:55Z)
GANsemble for Small and Imbalanced Data Sets: A Baseline for Synthetic Microplastics Data [2.307414552248669]
This paper proposes GANsemble: a framework connecting data augmentation with conditional generative adversarial networks (cGANs) to generate class-conditioned synthetic data. To our knowledge, this study is the first application of generative AI to synthetically create microplastics data.
arXiv Detail & Related papers (2024-04-10T21:23:13Z)
Minimally Supervised Learning using Topological Projections in Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs) Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU) Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z)
Human Limits in Machine Learning: Prediction of Plant Phenotypes Using Soil Microbiome Data [0.2812395851874055]
We provide the first deep investigation of the predictive potential of machine learning models to understand the connections between soil and biological phenotypes. We show that prediction is improved when incorporating environmental features like soil physicochemical properties and microbial population density into the models.
arXiv Detail & Related papers (2023-06-19T20:52:37Z)
Label scarcity in biomedicine: Data-rich latent factor discovery enhances phenotype prediction [102.23901690661916]
Low-dimensional embedding spaces can be derived from the UK Biobank population dataset to enhance data-scarce prediction of health indicators, lifestyle and demographic characteristics. Performances gains from semisupervison approaches will probably become an important ingredient for various medical data science applications.
arXiv Detail & Related papers (2021-10-12T16:25:50Z)
Deep neural networks approach to microbial colony detection -- a comparative analysis [52.77024349608834]
This study investigates the performance of three deep learning approaches for object detection on the AGAR dataset. The achieved results may serve as a benchmark for future experiments.
arXiv Detail & Related papers (2021-08-23T12:06:00Z)
Data-Driven Logistic Regression Ensembles With Applications in Genomics [0.0]
We propose a new approach for dealing with high-dimensional binary classification problems that combines ideas from regularization and ensembling. We demonstrate the good performance of our method in terms of prediction accuracy and identification of key biomarkers using several medical datasets involving common diseases such as cancer, multiple sclerosis and psoriasis.
arXiv Detail & Related papers (2021-02-17T05:57:26Z)
Towards an Automatic Analysis of CHO-K1 Suspension Growth in Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data. Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
Mycorrhiza: Genotype Assignment usingPhylogenetic Networks [2.286041284499166]
We introduce Mycorrhiza, a machine learning approach for the genotype assignment problem. Our algorithm makes use of phylogenetic networks to engineer features that encode the evolutionary relationships among samples. Mycorrhiza yields particularly significant gains on datasets with a large average fixation index (FST) or deviation from the Hardy-Weinberg equilibrium.
arXiv Detail & Related papers (2020-10-14T02:36:27Z)
Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients. We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks. Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
Inflammatory Bowel Disease Biomarkers of Human Gut Microbiota Selected via Ensemble Feature Selection Methods [0.0]
Inflammatory Bowel Diseases (IBD), diabetes, and cancer can cause several diseases such as Inflammatory Bowel Diseases (IBD), diabetes, and cancer. IBD, is a gut related disorder where the deviations from the healthy gut microbiome are considered to be associated with IBD. This study utilizes both supervised and unsupervised machine learning algorithms to generate a classification model that aids IBD diagnosis.
arXiv Detail & Related papers (2020-01-08T13:17:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.