Automated Bioinformatics Analysis via AutoBA
- URL: http://arxiv.org/abs/2309.03242v1
- Date: Wed, 6 Sep 2023 07:54:45 GMT
- Title: Automated Bioinformatics Analysis via AutoBA
- Authors: Juexiao Zhou, Bin Zhang, Xiuying Chen, Haoyang Li, Xiaopeng Xu, Siyuan
Chen, Xin Gao
- Abstract summary: Auto Bioinformatics Analysis (AutoBA) is an autonomous AI agent based on a large language model designed explicitly for conventional omics data analysis.
AutoBA's robustness and adaptability are affirmed across a diverse range of omics analysis cases, including whole genome sequencing (WGS), RNA sequencing (RNA-seq), single-cell RNA-seq, ChIP-seq, and spatial transcriptomics.
- Score: 33.09743154722675
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With the fast-growing and evolving omics data, the demand for streamlined and
adaptable tools to handle the analysis continues to grow. In response to this
need, we introduce Auto Bioinformatics Analysis (AutoBA), an autonomous AI
agent based on a large language model designed explicitly for conventional
omics data analysis. AutoBA simplifies the analytical process by requiring
minimal user input while delivering detailed step-by-step plans for various
bioinformatics tasks. Through rigorous validation by expert bioinformaticians,
AutoBA's robustness and adaptability are affirmed across a diverse range of
omics analysis cases, including whole genome sequencing (WGS), RNA sequencing
(RNA-seq), single-cell RNA-seq, ChIP-seq, and spatial transcriptomics. AutoBA's
unique capacity to self-design analysis processes based on input data
variations further underscores its versatility. Compared with online
bioinformatic services, AutoBA deploys the analysis locally, preserving data
privacy. Moreover, different from the predefined pipeline, AutoBA has
adaptability in sync with emerging bioinformatics tools. Overall, AutoBA
represents a convenient tool, offering robustness and adaptability for complex
omics data analysis.
Related papers
- QUIS: Question-guided Insights Generation for Automated Exploratory Data Analysis [1.9521598508325781]
We introduce QUIS, a fully automated EDA system that operates in two stages: insight generation (ISGen) driven by question generation (QUGen)
The ISGen module analyzes data to produce multiple relevant insights in response to each question, requiring no prior training and enabling QUIS to adapt to new datasets.
arXiv Detail & Related papers (2024-10-14T08:21:25Z) - CellAgent: An LLM-driven Multi-Agent Framework for Automated Single-cell Data Analysis [35.61361183175167]
Single-cell RNA sequencing (scRNA-seq) data analysis is crucial for biological research.
However, manual manipulation of various tools to achieve desired outcomes can be labor-intensive for researchers.
We introduce CellAgent, an LLM-driven multi-agent framework for the automatic processing and execution of scRNA-seq data analysis tasks.
arXiv Detail & Related papers (2024-07-13T09:14:50Z) - SeqMate: A Novel Large Language Model Pipeline for Automating RNA Sequencing [0.0]
SeqMate is a tool that allows for one-click analytics by utilizing the power of a large language model (LLM) to automate both data preparation and analysis.
By utilizing the power of generative AI, SeqMate is also capable of analyzing such findings and producing written reports of upregulated/downregulated/user-prompted genes.
arXiv Detail & Related papers (2024-07-02T20:28:30Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA.
It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z) - Automatically Balancing Model Accuracy and Complexity using Solution and
Fitness Evolution (SAFE) [4.149117182410553]
We investigate whether multiple objectives can be dynamically tuned by our proposed coevolutionary algorithm, SAFE (Solution And Fitness Evolution).
We find that SAFE is able to automatically tune accuracy and complexity with no performance loss, as compared with a standard evolutionary algorithm, over complex simulated genetics datasets produced by the GAMETES tool.
arXiv Detail & Related papers (2022-06-30T16:55:33Z) - Automated Materials Spectroscopy Analysis using Genetic Algorithms [12.447537764798795]
Genetic Algorithm (GA) based, open-source project to solve multi-objective optimization problems of materials characterization data analysis.
modular design and multiple crossover and mutation options make the software for additional materials characterization applications too.
arXiv Detail & Related papers (2022-03-18T20:36:31Z) - Sensitivity analysis in differentially private machine learning using
hybrid automatic differentiation [54.88777449903538]
We introduce a novel textithybrid automatic differentiation (AD) system for sensitivity analysis.
This enables modelling the sensitivity of arbitrary differentiable function compositions, such as the training of neural networks on private data.
Our approach can enable the principled reasoning about privacy loss in the setting of data processing.
arXiv Detail & Related papers (2021-07-09T07:19:23Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - Fast, Accurate, and Simple Models for Tabular Data via Augmented
Distillation [97.42894942391575]
We propose FAST-DAD to distill arbitrarily complex ensemble predictors into individual models like boosted trees, random forests, and deep networks.
Our individual distilled models are over 10x faster and more accurate than ensemble predictors produced by AutoML tools like H2O/AutoSklearn.
arXiv Detail & Related papers (2020-06-25T09:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.