Human Limits in Machine Learning: Prediction of Plant Phenotypes Using
Soil Microbiome Data
- URL: http://arxiv.org/abs/2306.11157v2
- Date: Sat, 17 Feb 2024 03:03:59 GMT
- Title: Human Limits in Machine Learning: Prediction of Plant Phenotypes Using
Soil Microbiome Data
- Authors: Rosa Aghdam, Xudong Tang, Shan Shan, Richard Lankau, Claudia
Sol\'is-Lemus
- Abstract summary: We provide the first deep investigation of the predictive potential of machine learning models to understand the connections between soil and biological phenotypes.
We show that prediction is improved when incorporating environmental features like soil physicochemical properties and microbial population density into the models.
- Score: 0.2812395851874055
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The preservation of soil health is a critical challenge in the 21st century
due to its significant impact on agriculture, human health, and biodiversity.
We provide the first deep investigation of the predictive potential of machine
learning models to understand the connections between soil and biological
phenotypes. We investigate an integrative framework performing accurate machine
learning-based prediction of plant phenotypes from biological, chemical, and
physical properties of the soil via two models: random forest and Bayesian
neural network. We show that prediction is improved when incorporating
environmental features like soil physicochemical properties and microbial
population density into the models, in addition to the microbiome information.
Exploring various data preprocessing strategies confirms the significant impact
of human decisions on predictive performance. We show that the naive total sum
scaling normalization that is commonly used in microbiome research is not the
optimal strategy to maximize predictive power. Also, we find that accurately
defined labels are more important than normalization, taxonomic level or model
characteristics. In cases where humans are unable to classify samples
accurately, machine learning model performance is limited. Lastly, we provide
domain scientists via a full model selection decision tree to identify the
human choices that optimize model prediction power. Our work is accompanied by
open source reproducible scripts
(https://github.com/solislemuslab/soil-microbiome-nn) for maximum outreach
among the microbiome research community.
Related papers
- BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [116.43369600518163]
We develop BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions.
BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model or explicitly design an acquisition function.
It achieves an average of 18% improvement in detecting desired phenotypes across five datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z) - Smoke and Mirrors in Causal Downstream Tasks [59.90654397037007]
This paper looks at the causal inference task of treatment effect estimation.
We assume binary effects that are recorded as high-dimensional images in a Randomized Controlled Trial.
We compare 6 480 models fine-tuned from state-of-the-art visual backbones.
We find that the sampling and modeling choices significantly affect the accuracy of the causal estimate.
arXiv Detail & Related papers (2024-05-27T13:26:34Z) - Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity [3.972930262155919]
We propose a framework taking advantage of existing large models for gene vectorization to predict habitat specificity from entire microbial genome sequences.
We train and validate our approach on a large dataset of high quality microbiome genomes from different habitats.
arXiv Detail & Related papers (2024-05-09T09:34:51Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - A Deep Neural Network -- Mechanistic Hybrid Model to Predict
Pharmacokinetics in Rat [0.0]
In this work we improve the hybrid model developed earlier.
We reduce the median fold change error for the total oral exposure from 2.85 to 2.35 and for intravenous administration from 1.95 to 1.62.
In contrast to a pure machine learning model, our model is able to predict new end points on which it has not been trained.
arXiv Detail & Related papers (2023-10-13T15:01:55Z) - Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes [72.13373216644021]
We study the societal impact of machine learning by considering the collection of models that are deployed in a given context.
We find deployed machine learning is prone to systemic failure, meaning some users are exclusively misclassified by all models available.
These examples demonstrate ecosystem-level analysis has unique strengths for characterizing the societal impact of machine learning.
arXiv Detail & Related papers (2023-07-12T01:11:52Z) - Application of data engineering approaches to address challenges in
microbiome data for optimal medical decision-making [0.0]
The study addresses the issues inherent to microbiome datasets and could be highly beneficial for providing personalized medicine.
The prototype employed in the study addresses the issues inherent to microbiome datasets and could be highly beneficial for providing personalized medicine.
arXiv Detail & Related papers (2023-06-30T05:36:39Z) - Adaptive Transfer Learning for Plant Phenotyping [33.28898554551106]
We study the knowledge transferability of modern machine learning models in plant phenotyping.
How is the performance of conventional machine learning models affected by the number of annotated samples for plant phenotyping?
Could the neural network based transfer learning models improve the performance of plant phenotyping?
arXiv Detail & Related papers (2022-01-14T00:40:40Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - Mycorrhiza: Genotype Assignment usingPhylogenetic Networks [2.286041284499166]
We introduce Mycorrhiza, a machine learning approach for the genotype assignment problem.
Our algorithm makes use of phylogenetic networks to engineer features that encode the evolutionary relationships among samples.
Mycorrhiza yields particularly significant gains on datasets with a large average fixation index (FST) or deviation from the Hardy-Weinberg equilibrium.
arXiv Detail & Related papers (2020-10-14T02:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.