Related papers: SeqMate: A Novel Large Language Model Pipeline for Automating RNA Sequencing

SeqMate: A Novel Large Language Model Pipeline for Automating RNA Sequencing

URL: http://arxiv.org/abs/2407.03381v1
Date: Tue, 2 Jul 2024 20:28:30 GMT
Title: SeqMate: A Novel Large Language Model Pipeline for Automating RNA Sequencing
Authors: Devam Mondal, Atharva Inamdar,
Abstract summary: SeqMate is a tool that allows for one-click analytics by utilizing the power of a large language model (LLM) to automate both data preparation and analysis. By utilizing the power of generative AI, SeqMate is also capable of analyzing such findings and producing written reports of upregulated/downregulated/user-prompted genes.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: RNA sequencing techniques, like bulk RNA-seq and Single Cell (sc) RNA-seq, are critical tools for the biologist looking to analyze the genetic activity/transcriptome of a tissue or cell during an experimental procedure. Platforms like Illumina's next-generation sequencing (NGS) are used to produce the raw data for this experimental procedure. This raw FASTQ data must then be prepared via a complex series of data manipulations by bioinformaticians. This process currently takes place on an unwieldy textual user interface like a terminal/command line that requires the user to install and import multiple program packages, preventing the untrained biologist from initiating data analysis. Open-source platforms like Galaxy have produced a more user-friendly pipeline, yet the visual interface remains cluttered and highly technical, remaining uninviting for the natural scientist. To address this, SeqMate is a user-friendly tool that allows for one-click analytics by utilizing the power of a large language model (LLM) to automate both data preparation and analysis (differential expression, trajectory analysis, etc). Furthermore, by utilizing the power of generative AI, SeqMate is also capable of analyzing such findings and producing written reports of upregulated/downregulated/user-prompted genes with sources cited from known repositories like PubMed, PDB, and Uniprot.

Related papers

ChemActor: Enhancing Automated Extraction of Chemical Synthesis Actions with LLM-Generated Data [53.78763789036172]
We present ChemActor, a fully fine-tuned large language model (LLM) as a chemical executor to convert between unstructured experimental procedures and structured action sequences.<n>This framework integrates a data selection module that selects data based on distribution divergence, with a general-purpose LLM, to generate machine-executable actions from a single molecule input.<n>Experiments on reaction-to-description (R2D) and description-to-action (D2A) tasks demonstrate that ChemActor achieves state-of-the-art performance, outperforming the baseline model by 10%.
arXiv Detail & Related papers (2025-06-30T05:11:19Z)
Procedural Environment Generation for Tool-Use Agents [55.417058694785325]
We introduce RandomWorld, a pipeline for the procedural generation of interactive tools and compositional tool-use data.<n>We show that models tuned via SFT and RL on synthetic RandomWorld data improve on a range of tool-use benchmarks.
arXiv Detail & Related papers (2025-05-21T14:10:06Z)
OLAF: An Open Life Science Analysis Framework for Conversational Bioinformatics Powered by Large Language Models [0.0]
OLAF (Open Life Science Analysis Framework) is an open-source platform that enables researchers to perform bioinformatics analyses using natural language. By combining large language models (LLMs) with a modular agent-pipe-router architecture, OLAF generates and executes bioinformatics code on real scientific data.
arXiv Detail & Related papers (2025-04-04T22:41:16Z)
A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following [32.67347401145835]
Large language models excel at interpreting complex natural language instructions, enabling them to perform a wide range of tasks. We present InstructCell, a multi-modal AI copilot that leverages natural language as a medium for more direct and flexible single-cell analysis. InstructCell empowers researchers to accomplish critical tasks-such as cell type annotation, conditional pseudo-cell generation, and drug sensitivity prediction-using straightforward natural language commands.
arXiv Detail & Related papers (2025-01-14T15:12:19Z)
ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis [80.34000499166648]
We propose a Graph-based Sampling strategy to sample more relevant tool combinations, and a Planned-generation strategy to create plans that guide the synthesis of coherent dialogues. We apply SFT on LLaMA-3.1-8B using 8,000 synthetic dialogues generated with ToolFlow. Results show that the model achieves tool-calling performance comparable to or even surpassing GPT-4, while maintaining strong general capabilities.
arXiv Detail & Related papers (2024-10-24T05:45:04Z)
From Text to Test: AI-Generated Control Software for Materials Science Instruments [0.0]
Large language models (LLMs) are transforming the landscape of chemistry and materials science. Here, we demonstrate the rapid deployment of a Python-based control module for a Keithley 2400 electrical source measure unit.
arXiv Detail & Related papers (2024-06-23T21:32:57Z)
scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain. scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator. This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z)
Automated Bioinformatics Analysis via AutoBA [33.09743154722675]
Auto Bioinformatics Analysis (AutoBA) is an autonomous AI agent based on a large language model designed explicitly for conventional omics data analysis. AutoBA's robustness and adaptability are affirmed across a diverse range of omics analysis cases, including whole genome sequencing (WGS), RNA sequencing (RNA-seq), single-cell RNA-seq, ChIP-seq, and spatial transcriptomics.
arXiv Detail & Related papers (2023-09-06T07:54:45Z)
PEvoLM: Protein Sequence Evolutionary Information Language Model [0.0]
A protein sequence is a collection of contiguous tokens or characters called amino acids (AAs) This research presents an Embedding Language Model (ELMo), converting a protein sequence to a numerical vector representation. The model was trained not only on predicting the next AA but also on the probability distribution of the next AA derived from similar, yet different sequences.
arXiv Detail & Related papers (2023-08-16T06:46:28Z)
regulAS: A Bioinformatics Tool for the Integrative Analysis of Alternative Splicing Regulome using RNA-Seq data [0.0]
regulAS is a bioinformatics tool designed to support computational biology researchers in investigating regulatory mechanisms of splicing alterations. The core functionality of regulAS enables the automation of computational experiments, efficient results storage and processing, and streamlined workflow management. Integrated basic modules extend regulAS with features such as RNA-Seq data retrieval from the public multi-omics UCSC Xena data repository, predictive modeling and feature ranking capabilities.
arXiv Detail & Related papers (2023-07-17T19:33:49Z)
Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR) It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z)
TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations. We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z)
Towards an Automatic Analysis of CHO-K1 Suspension Growth in Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data. Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
Multi-layer Optimizations for End-to-End Data Analytics [71.05611866288196]
We introduce Iterative Functional Aggregate Queries (IFAQ), a framework that realizes an alternative approach. IFAQ treats the feature extraction query and the learning task as one program given in the IFAQ's domain-specific language. We show that a Scala implementation of IFAQ can outperform mlpack, Scikit, and specialization by several orders of magnitude for linear regression and regression tree models over several relational datasets.
arXiv Detail & Related papers (2020-01-10T16:14:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.