SeqMate: A Novel Large Language Model Pipeline for Automating RNA Sequencing
- URL: http://arxiv.org/abs/2407.03381v1
- Date: Tue, 2 Jul 2024 20:28:30 GMT
- Title: SeqMate: A Novel Large Language Model Pipeline for Automating RNA Sequencing
- Authors: Devam Mondal, Atharva Inamdar,
- Abstract summary: SeqMate is a tool that allows for one-click analytics by utilizing the power of a large language model (LLM) to automate both data preparation and analysis.
By utilizing the power of generative AI, SeqMate is also capable of analyzing such findings and producing written reports of upregulated/downregulated/user-prompted genes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: RNA sequencing techniques, like bulk RNA-seq and Single Cell (sc) RNA-seq, are critical tools for the biologist looking to analyze the genetic activity/transcriptome of a tissue or cell during an experimental procedure. Platforms like Illumina's next-generation sequencing (NGS) are used to produce the raw data for this experimental procedure. This raw FASTQ data must then be prepared via a complex series of data manipulations by bioinformaticians. This process currently takes place on an unwieldy textual user interface like a terminal/command line that requires the user to install and import multiple program packages, preventing the untrained biologist from initiating data analysis. Open-source platforms like Galaxy have produced a more user-friendly pipeline, yet the visual interface remains cluttered and highly technical, remaining uninviting for the natural scientist. To address this, SeqMate is a user-friendly tool that allows for one-click analytics by utilizing the power of a large language model (LLM) to automate both data preparation and analysis (differential expression, trajectory analysis, etc). Furthermore, by utilizing the power of generative AI, SeqMate is also capable of analyzing such findings and producing written reports of upregulated/downregulated/user-prompted genes with sources cited from known repositories like PubMed, PDB, and Uniprot.
Related papers
- ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis [80.34000499166648]
We propose a Graph-based Sampling strategy to sample more relevant tool combinations, and a Planned-generation strategy to create plans that guide the synthesis of coherent dialogues.
We apply SFT on LLaMA-3.1-8B using 8,000 synthetic dialogues generated with ToolFlow.
Results show that the model achieves tool-calling performance comparable to or even surpassing GPT-4, while maintaining strong general capabilities.
arXiv Detail & Related papers (2024-10-24T05:45:04Z) - From Text to Test: AI-Generated Control Software for Materials Science Instruments [0.0]
Large language models (LLMs) are transforming the landscape of chemistry and materials science.
Here, we demonstrate the rapid deployment of a Python-based control module for a Keithley 2400 electrical source measure unit.
arXiv Detail & Related papers (2024-06-23T21:32:57Z) - scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis
in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain.
scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator.
This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z) - Automated Bioinformatics Analysis via AutoBA [33.09743154722675]
Auto Bioinformatics Analysis (AutoBA) is an autonomous AI agent based on a large language model designed explicitly for conventional omics data analysis.
AutoBA's robustness and adaptability are affirmed across a diverse range of omics analysis cases, including whole genome sequencing (WGS), RNA sequencing (RNA-seq), single-cell RNA-seq, ChIP-seq, and spatial transcriptomics.
arXiv Detail & Related papers (2023-09-06T07:54:45Z) - PEvoLM: Protein Sequence Evolutionary Information Language Model [0.0]
A protein sequence is a collection of contiguous tokens or characters called amino acids (AAs)
This research presents an Embedding Language Model (ELMo), converting a protein sequence to a numerical vector representation.
The model was trained not only on predicting the next AA but also on the probability distribution of the next AA derived from similar, yet different sequences.
arXiv Detail & Related papers (2023-08-16T06:46:28Z) - regulAS: A Bioinformatics Tool for the Integrative Analysis of
Alternative Splicing Regulome using RNA-Seq data [0.0]
regulAS is a bioinformatics tool designed to support computational biology researchers in investigating regulatory mechanisms of splicing alterations.
The core functionality of regulAS enables the automation of computational experiments, efficient results storage and processing, and streamlined workflow management.
Integrated basic modules extend regulAS with features such as RNA-Seq data retrieval from the public multi-omics UCSC Xena data repository, predictive modeling and feature ranking capabilities.
arXiv Detail & Related papers (2023-07-17T19:33:49Z) - Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR)
It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Towards an Automatic Analysis of CHO-K1 Suspension Growth in
Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data.
Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z) - Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach.
IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language.
We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.