MIIDL: a Python package for microbial biomarkers identification powered
by interpretable deep learning
- URL: http://arxiv.org/abs/2109.12204v1
- Date: Fri, 24 Sep 2021 21:30:10 GMT
- Title: MIIDL: a Python package for microbial biomarkers identification powered
by interpretable deep learning
- Authors: Jian Jiang
- Abstract summary: We present MIIDL, a Python package for the identification of microbial biomarkers based on interpretable deep learning.
MIIDL innovatively applies convolutional neural networks, a variety of interpretability algorithms and plenty of pre-processing methods to provide a one-stop and robust pipeline for microbial biomarkers identification from high-dimensional and sparse data sets.
- Score: 5.749346757892117
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detecting microbial biomarkers used to predict disease phenotypes and
clinical outcomes is crucial for disease early-stage screening and diagnosis.
Most methods for biomarker identification are linear-based, which is very
limited as biological processes are rarely fully linear. The introduction of
machine learning to this field tends to bring a promising solution. However,
identifying microbial biomarkers in an interpretable, data-driven and robust
manner remains challenging. We present MIIDL, a Python package for the
identification of microbial biomarkers based on interpretable deep learning.
MIIDL innovatively applies convolutional neural networks, a variety of
interpretability algorithms and plenty of pre-processing methods to provide a
one-stop and robust pipeline for microbial biomarkers identification from
high-dimensional and sparse data sets.
Related papers
- How quantum computing can enhance biomarker discovery for multi-factorial diseases [0.14511217610551727]
Quantum algorithms, particularly in machine learning, are mapped to key applications in biomarker discovery.
The opportunities and challenges associated with the algorithms and applications are discussed.
An outlook is provided concerning open research challenges.
arXiv Detail & Related papers (2024-11-15T16:50:05Z) - Revolutionizing Biomarker Discovery: Leveraging Generative AI for Bio-Knowledge-Embedded Continuous Space Exploration [20.419747013569268]
We propose a new biomarker identification framework with two important modules: training data preparation and embedding-optimization-generation.
The first module uses a multi-agent system to automatically collect pairs of biomarker subsets and their corresponding prediction accuracy as training data.
The second module employs an encoder-evaluator-decoder learning paradigm to compress the knowledge of the collected data into a continuous space.
arXiv Detail & Related papers (2024-09-23T23:36:30Z) - MMIL: A novel algorithm for disease associated cell type discovery [58.044870442206914]
Single-cell datasets often lack individual cell labels, making it challenging to identify cells associated with disease.
We introduce Mixture Modeling for Multiple Learning Instance (MMIL), an expectation method that enables the training and calibration of cell-level classifiers.
arXiv Detail & Related papers (2024-06-12T15:22:56Z) - An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems.
Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z) - scBeacon: single-cell biomarker extraction via identifying paired cell
clusters across biological conditions with contrastive siamese networks [0.9591674293850556]
scBeacon is a framework built upon a deep contrastive siamese network.
scBeacon adeptly identifies matched cell populations across varied conditions.
Comprehensive evaluations validate scBeacon's superiority over existing single-cell differential gene analysis tools.
arXiv Detail & Related papers (2023-11-05T08:27:24Z) - ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab [67.24684071577211]
The challenge of replicating research results has posed a significant impediment to the field of molecular biology.
We first curate a comprehensive multimodal dataset, named ProBio, as an initial step towards this objective.
Next, we devise two challenging benchmarks, transparent solution tracking and multimodal action recognition, to emphasize the unique characteristics and difficulties associated with activity understanding in BioLab settings.
arXiv Detail & Related papers (2023-11-01T14:44:01Z) - Lymphocyte Classification in Hyperspectral Images of Ovarian Cancer
Tissue Biopsy Samples [94.37521840642141]
We present a machine learning pipeline to segment white blood cell pixels in hyperspectral images of biopsy cores.
These cells are clinically important for diagnosis, but some prior work has struggled to incorporate them due to difficulty obtaining precise pixel labels.
arXiv Detail & Related papers (2022-03-23T00:58:27Z) - Deep neural networks approach to microbial colony detection -- a
comparative analysis [52.77024349608834]
This study investigates the performance of three deep learning approaches for object detection on the AGAR dataset.
The achieved results may serve as a benchmark for future experiments.
arXiv Detail & Related papers (2021-08-23T12:06:00Z) - Preventing dataset shift from breaking machine-learning biomarkers [0.6138671548064355]
A good biomarker is one that gives reliable detection of the corresponding condition.
Biomarkers are often extracted from a cohort that differs from the target population.
Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals.
arXiv Detail & Related papers (2021-07-21T08:54:23Z) - Data-Driven Logistic Regression Ensembles With Applications in Genomics [0.0]
We propose a new approach for dealing with high-dimensional binary classification problems that combines ideas from regularization and ensembling.
We demonstrate the good performance of our method in terms of prediction accuracy and identification of key biomarkers using several medical datasets involving common diseases such as cancer, multiple sclerosis and psoriasis.
arXiv Detail & Related papers (2021-02-17T05:57:26Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.