Related papers: GenoML: Automated Machine Learning for Genomics

GenoML: Automated Machine Learning for Genomics

URL: http://arxiv.org/abs/2103.03221v1
Date: Thu, 4 Mar 2021 18:48:40 GMT
Title: GenoML: Automated Machine Learning for Genomics
Authors: Mary B. Makarious, Hampton L. Leonard, Dan Vitale, Hirotaka Iwaki, David Saffo, Lana Sargent, Anant Dadu, Eduardo Salmer\'on Casta\~no, John F. Carter, Melina Maleknia, Juan A. Botia, Cornelis Blauwendraat, Roy H. Campbell, Sayed Hadi Hashemi, Andrew B. Singleton, Mike A. Nalls, Faraz Faghri
Abstract summary: GenoML is a Python package automating machine learning for genomics (genetics and multi-omics) GenoML's mission is to bring machine learning for genomics and clinical data to non-experts.
Score: 3.2739205123864945
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: GenoML is a Python package automating machine learning workflows for genomics (genetics and multi-omics) with an open science philosophy. Genomics data require significant domain expertise to clean, pre-process, harmonize and perform quality control of the data. Furthermore, tuning, validation, and interpretation involve taking into account the biology and possibly the limitations of the underlying data collection, protocols, and technology. GenoML's mission is to bring machine learning for genomics and clinical data to non-experts by developing an easy-to-use tool that automates the full development, evaluation, and deployment process. Emphasis is put on open science to make workflows easily accessible, replicable, and transferable within the scientific community. Source code and documentation is available at https://genoml.com.

Related papers

Mic-hackathon 2024: Hackathon on Machine Learning for Electron and Scanning Probe Microscopy [54.24356756795849]
Microscopy is a primary source of information on materials structure and functionality at nanometer and atomic scales.<n>The adoption of Data Management Plans (DMPs) by major funding agencies promotes preservation and access.<n> deriving insights remains difficult due to the lack of standardized code ecosystems, benchmarks, and integration strategies.
arXiv Detail & Related papers (2025-06-10T03:54:36Z)
Agentomics-ML: Autonomous Machine Learning Experimentation Agent for Genomic and Transcriptomic Data [33.7054351451505]
We introduce Agentomics-ML, a fully autonomous agent-based system designed to produce a classification model.<n>We show that Agentomics-ML outperforms existing state-of-the-art agent-based methods in both generalization and success rates.
arXiv Detail & Related papers (2025-06-05T19:44:38Z)
The AI Cosmologist I: An Agentic System for Automated Data Analysis [0.0]
The AI Cosmologist implements a complete pipeline from idea generation to experimental evaluation and research dissemination. Unlike traditional auto machine-learning systems, the AI Cosmologist generates diverse implementation strategies. Results indicate that agentic systems can automate portions of the research process, potentially accelerating scientific discovery.
arXiv Detail & Related papers (2025-04-04T13:12:08Z)
Language Model Powered Digital Biology with BRAD [5.309032614374711]
Large Language Models (LLMs) are well-suited for unstructured integration. We present a prototype Bioinformatics Retrieval Augmented Digital assistant (BRAD)
arXiv Detail & Related papers (2024-09-04T16:43:14Z)
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering. Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications. These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z)
BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions. BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model. It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z)
Boolean matrix logic programming for active learning of gene functions in genome-scale metabolic network models [4.762323642506732]
We seek to apply logic-based machine learning techniques to facilitate cellular engineering and drive biological discovery. We introduce a new system, $BMLP_active$, which efficiently explores the genomic hypothesis space by guiding informative experimentation. $BMLP_active$ can successfully learn the interaction between a gene pair with fewer training examples than random experimentation.
arXiv Detail & Related papers (2024-05-10T09:51:06Z)
EndToEndML: An Open-Source End-to-End Pipeline for Machine Learning Applications [0.2826977330147589]
We propose a web-based end-to-end pipeline that is capable of preprocessing, training, evaluating, and visualizing machine learning models. Our library assists in recognizing, classifying, clustering, and predicting a wide range of multi-modal, multi-sensor datasets.
arXiv Detail & Related papers (2024-03-27T02:24:38Z)
Toward a Team of AI-made Scientists for Scientific Discovery from Gene Expression Data [9.767546641019862]
We introduce a novel framework, a Team of AI-made Scientists (TAIS), designed to streamline the scientific discovery pipeline. TAIS comprises simulated roles, including a project manager, data engineer, and domain expert, each represented by a Large Language Model (LLM) These roles collaborate to replicate the tasks typically performed by data scientists, with a specific focus on identifying disease-predictive genes.
arXiv Detail & Related papers (2024-02-15T06:30:12Z)
Machine learning in bioprocess development: From promise to practice [58.720142291102135]
Data-driven methods like machine learning (ML) approaches have a high potential to rationally explore large design spaces. The aim of this review is to demonstrate how ML methods have been applied so far in bioprocess development.
arXiv Detail & Related papers (2022-10-04T13:48:59Z)
When Bioprocess Engineering Meets Machine Learning: A Survey from the Perspective of Automated Bioprocess Development [3.687740185234604]
Machine learning (ML) has significantly contributed to the development of bioprocess engineering, but its application is still limited. This review provides a comprehensive overview of ML-based automation in bioprocess development.
arXiv Detail & Related papers (2022-09-02T14:30:49Z)
Deep metric learning improves lab of origin prediction of genetically engineered plasmids [63.05016513788047]
Genetic engineering attribution (GEA) is the ability to make sequence-lab associations. We propose a method, based on metric learning, that ranks the most likely labs-of-origin. We are able to extract key signatures in plasmid sequences for particular labs, allowing for an interpretable examination of the model's outputs.
arXiv Detail & Related papers (2021-11-24T16:29:03Z)
Ten Quick Tips for Deep Learning in Biology [116.78436313026478]
Machine learning is concerned with the development and applications of algorithms that can recognize patterns in data and use them for predictive modeling. Deep learning has become its own subfield of machine learning. In the context of biological research, deep learning has been increasingly used to derive novel insights from high-dimensional biological data.
arXiv Detail & Related papers (2021-05-29T21:02:44Z)
Automated Biodesign Engineering by Abductive Meta-Interpretive Learning [8.788941848262786]
We propose an automated biodesign engineering framework empowered by Abductive Meta-Interpretive Learning ($Meta_Abd$) In this work, we propose an automated biodesign engineering framework empowered by Abductive Meta-Interpretive Learning ($Meta_Abd$)
arXiv Detail & Related papers (2021-05-17T12:10:26Z)
Towards an Automatic Analysis of CHO-K1 Suspension Growth in Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data. Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.