GenoML: Automated Machine Learning for Genomics
- URL: http://arxiv.org/abs/2103.03221v1
- Date: Thu, 4 Mar 2021 18:48:40 GMT
- Title: GenoML: Automated Machine Learning for Genomics
- Authors: Mary B. Makarious, Hampton L. Leonard, Dan Vitale, Hirotaka Iwaki,
David Saffo, Lana Sargent, Anant Dadu, Eduardo Salmer\'on Casta\~no, John F.
Carter, Melina Maleknia, Juan A. Botia, Cornelis Blauwendraat, Roy H.
Campbell, Sayed Hadi Hashemi, Andrew B. Singleton, Mike A. Nalls, Faraz
Faghri
- Abstract summary: GenoML is a Python package automating machine learning for genomics (genetics and multi-omics)
GenoML's mission is to bring machine learning for genomics and clinical data to non-experts.
- Score: 3.2739205123864945
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: GenoML is a Python package automating machine learning workflows for genomics
(genetics and multi-omics) with an open science philosophy. Genomics data
require significant domain expertise to clean, pre-process, harmonize and
perform quality control of the data. Furthermore, tuning, validation, and
interpretation involve taking into account the biology and possibly the
limitations of the underlying data collection, protocols, and technology.
GenoML's mission is to bring machine learning for genomics and clinical data to
non-experts by developing an easy-to-use tool that automates the full
development, evaluation, and deployment process. Emphasis is put on open
science to make workflows easily accessible, replicable, and transferable
within the scientific community. Source code and documentation is available at
https://genoml.com.
Related papers
- Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions.
BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model.
It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z) - Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models [4.762323642506732]
We seek to apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery.
We introduce a new system, $BMLP_active$, which efficiently explores the genomic hypothesis space by guiding informative experimentation.
$BMLP_active$ can successfully learn the interaction between a gene pair with fewer training examples than random experimentation.
arXiv Detail & Related papers (2024-05-10T09:51:06Z) - EndToEndML: An Open-Source End-to-End Pipeline for Machine Learning Applications [0.2826977330147589]
We propose a web-based end-to-end pipeline that is capable of preprocessing, training, evaluating, and visualizing machine learning models.
Our library assists in recognizing, classifying, clustering, and predicting a wide range of multi-modal, multi-sensor datasets.
arXiv Detail & Related papers (2024-03-27T02:24:38Z) - Toward a Team of AI-made Scientists for Scientific Discovery from Gene
Expression Data [9.767546641019862]
We introduce a novel framework, a Team of AI-made Scientists (TAIS), designed to streamline the scientific discovery pipeline.
TAIS comprises simulated roles, including a project manager, data engineer, and domain expert, each represented by a Large Language Model (LLM)
These roles collaborate to replicate the tasks typically performed by data scientists, with a specific focus on identifying disease-predictive genes.
arXiv Detail & Related papers (2024-02-15T06:30:12Z) - Machine learning in bioprocess development: From promise to practice [58.720142291102135]
Data-driven methods like machine learning (ML) approaches have a high potential to rationally explore large design spaces.
The aim of this review is to demonstrate how ML methods have been applied so far in bioprocess development.
arXiv Detail & Related papers (2022-10-04T13:48:59Z) - When Bioprocess Engineering Meets Machine Learning: A Survey from the
Perspective of Automated Bioprocess Development [3.687740185234604]
Machine learning (ML) has significantly contributed to the development of bioprocess engineering, but its application is still limited.
This review provides a comprehensive overview of ML-based automation in bioprocess development.
arXiv Detail & Related papers (2022-09-02T14:30:49Z) - Deep metric learning improves lab of origin prediction of genetically
engineered plasmids [63.05016513788047]
Genetic engineering attribution (GEA) is the ability to make sequence-lab associations.
We propose a method, based on metric learning, that ranks the most likely labs-of-origin.
We are able to extract key signatures in plasmid sequences for particular labs, allowing for an interpretable examination of the model's outputs.
arXiv Detail & Related papers (2021-11-24T16:29:03Z) - Ten Quick Tips for Deep Learning in Biology [116.78436313026478]
Machine learning is concerned with the development and applications of algorithms that can recognize patterns in data and use them for predictive modeling.
Deep learning has become its own subfield of machine learning.
In the context of biological research, deep learning has been increasingly used to derive novel insights from high-dimensional biological data.
arXiv Detail & Related papers (2021-05-29T21:02:44Z) - Automated Biodesign Engineering by Abductive Meta-Interpretive Learning [8.788941848262786]
We propose an automated biodesign engineering framework empowered by Abductive Meta-Interpretive Learning ($Meta_Abd$)
In this work, we propose an automated biodesign engineering framework empowered by Abductive Meta-Interpretive Learning ($Meta_Abd$)
arXiv Detail & Related papers (2021-05-17T12:10:26Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.