regulAS: A Bioinformatics Tool for the Integrative Analysis of
Alternative Splicing Regulome using RNA-Seq data
- URL: http://arxiv.org/abs/2307.08800v1
- Date: Mon, 17 Jul 2023 19:33:49 GMT
- Title: regulAS: A Bioinformatics Tool for the Integrative Analysis of
Alternative Splicing Regulome using RNA-Seq data
- Authors: Sofya Lipnitskaya
- Abstract summary: regulAS is a bioinformatics tool designed to support computational biology researchers in investigating regulatory mechanisms of splicing alterations.
The core functionality of regulAS enables the automation of computational experiments, efficient results storage and processing, and streamlined workflow management.
Integrated basic modules extend regulAS with features such as RNA-Seq data retrieval from the public multi-omics UCSC Xena data repository, predictive modeling and feature ranking capabilities.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The regulAS software package is a bioinformatics tool designed to support
computational biology researchers in investigating regulatory mechanisms of
splicing alterations through integrative analysis of large-scale RNA-Seq data
from cancer and healthy human donors, characterized by TCGA and GTEx projects.
This technical report provides a comprehensive overview of regulAS, focusing on
its core functionality, basic modules, experiment configuration, further
extensibility and customisation.
The core functionality of regulAS enables the automation of computational
experiments, efficient results storage and processing, and streamlined workflow
management. Integrated basic modules extend regulAS with features such as
RNA-Seq data retrieval from the public multi-omics UCSC Xena data repository,
predictive modeling and feature ranking capabilities using the scikit-learn
package, and flexible reporting generation for analysing gene expression
profiles and relevant modulations of alternative splicing aberrations across
tissues and cancer types. Experiment configuration is handled through YAML
files with the Hydra and OmegaConf libraries, offering a user-friendly
approach. Additionally, regulAS allows for the development and integration of
custom modules to handle specialized tasks.
In conclusion, regulAS provides an automated solution for alternative
splicing and cancer biology studies, enhancing efficiency, reproducibility, and
customization of experimental design, while the extensibility of the pipeline
enables researchers to further tailor the software package to their specific
needs. Source code is available under the MIT license at
https://github.com/slipnitskaya/regulAS.
Related papers
- Exploring the Feasibility of Automated Data Standardization using Large Language Models for Seamless Positioning [10.200170217746136]
We propose a feasibility study for real-time automated data standardization leveraging Large Language Models (LLMs)
Our study ensures data compatibility and improves positioning accuracy using the Extended Kalman Filter (EKF)
This study underscores the potential of advanced LLMs in overcoming sensor data integration complexities.
arXiv Detail & Related papers (2024-08-22T02:40:21Z) - CellAgent: An LLM-driven Multi-Agent Framework for Automated Single-cell Data Analysis [35.61361183175167]
Single-cell RNA sequencing (scRNA-seq) data analysis is crucial for biological research.
However, manual manipulation of various tools to achieve desired outcomes can be labor-intensive for researchers.
We introduce CellAgent, an LLM-driven multi-agent framework for the automatic processing and execution of scRNA-seq data analysis tasks.
arXiv Detail & Related papers (2024-07-13T09:14:50Z) - SeqMate: A Novel Large Language Model Pipeline for Automating RNA Sequencing [0.0]
SeqMate is a tool that allows for one-click analytics by utilizing the power of a large language model (LLM) to automate both data preparation and analysis.
By utilizing the power of generative AI, SeqMate is also capable of analyzing such findings and producing written reports of upregulated/downregulated/user-prompted genes.
arXiv Detail & Related papers (2024-07-02T20:28:30Z) - UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models [88.16197692794707]
UniGen is a comprehensive framework designed to produce diverse, accurate, and highly controllable datasets.
To augment data diversity, UniGen incorporates an attribute-guided generation module and a group checking feature.
Extensive experiments demonstrate the superior quality of data generated by UniGen.
arXiv Detail & Related papers (2024-06-27T07:56:44Z) - CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments [51.41735920759667]
Large Language Models (LLMs) have shown promise in various tasks, but they often lack specific knowledge and struggle to accurately solve biological design problems.
In this work, we introduce CRISPR-GPT, an LLM agent augmented with domain knowledge and external tools to automate and enhance the design process of CRISPR-based gene-editing experiments.
arXiv Detail & Related papers (2024-04-27T22:59:17Z) - RigLSTM: Recurrent Independent Grid LSTM for Generalizable Sequence
Learning [75.61681328968714]
We propose recurrent independent Grid LSTM (RigLSTM) to exploit the underlying modular structure of the target task.
Our model adopts cell selection, input feature selection, hidden state selection, and soft state updating to achieve a better generalization ability.
arXiv Detail & Related papers (2023-11-03T07:40:06Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Synthetic Data Generator for Adaptive Interventions in Global Health [0.0]
We introduce HealthSyn, an open-source synthetic data generator of user behavior for testing reinforcement learning algorithms.
HealthSyn generates diverse user actions, with individual user behavioral patterns that can change in reaction to personalized interventions.
The generated data can be used to develop, test, and evaluate, both ML algorithms in research and end-to-end operational RL-based intervention delivery frameworks.
arXiv Detail & Related papers (2023-03-03T14:28:45Z) - Multi-modal Self-supervised Pre-training for Regulatory Genome Across
Cell Types [75.65676405302105]
We propose a simple yet effective approach for pre-training genome data in a multi-modal and self-supervised manner, which we call GeneBERT.
We pre-train our model on the ATAC-seq dataset with 17 million genome sequences.
arXiv Detail & Related papers (2021-10-11T12:48:44Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z) - PHOTONAI -- A Python API for Rapid Machine Learning Model Development [2.414341608751139]
PHOTONAI is a high-level Python API designed to simplify and accelerate machine learning model development.
It functions as a unifying framework allowing the user to easily access and combine algorithms from different toolboxes into custom algorithm sequences.
arXiv Detail & Related papers (2020-02-13T10:33:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.