Related papers: regulAS: A Bioinformatics Tool for the Integrative Analysis of Alternative Splicing Regulome using RNA-Seq data

regulAS: A Bioinformatics Tool for the Integrative Analysis of Alternative Splicing Regulome using RNA-Seq data

URL: http://arxiv.org/abs/2307.08800v1
Date: Mon, 17 Jul 2023 19:33:49 GMT
Title: regulAS: A Bioinformatics Tool for the Integrative Analysis of Alternative Splicing Regulome using RNA-Seq data
Authors: Sofya Lipnitskaya
Abstract summary: regulAS is a bioinformatics tool designed to support computational biology researchers in investigating regulatory mechanisms of splicing alterations. The core functionality of regulAS enables the automation of computational experiments, efficient results storage and processing, and streamlined workflow management. Integrated basic modules extend regulAS with features such as RNA-Seq data retrieval from the public multi-omics UCSC Xena data repository, predictive modeling and feature ranking capabilities.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The regulAS software package is a bioinformatics tool designed to support computational biology researchers in investigating regulatory mechanisms of splicing alterations through integrative analysis of large-scale RNA-Seq data from cancer and healthy human donors, characterized by TCGA and GTEx projects. This technical report provides a comprehensive overview of regulAS, focusing on its core functionality, basic modules, experiment configuration, further extensibility and customisation. The core functionality of regulAS enables the automation of computational experiments, efficient results storage and processing, and streamlined workflow management. Integrated basic modules extend regulAS with features such as RNA-Seq data retrieval from the public multi-omics UCSC Xena data repository, predictive modeling and feature ranking capabilities using the scikit-learn package, and flexible reporting generation for analysing gene expression profiles and relevant modulations of alternative splicing aberrations across tissues and cancer types. Experiment configuration is handled through YAML files with the Hydra and OmegaConf libraries, offering a user-friendly approach. Additionally, regulAS allows for the development and integration of custom modules to handle specialized tasks. In conclusion, regulAS provides an automated solution for alternative splicing and cancer biology studies, enhancing efficiency, reproducibility, and customization of experimental design, while the extensibility of the pipeline enables researchers to further tailor the software package to their specific needs. Source code is available under the MIT license at https://github.com/slipnitskaya/regulAS.

Related papers

evortran: a modern Fortran package for genetic algorithms with applications from LHC data fitting to LISA signal reconstruction [0.0]
evortran is a modern Fortran library designed for high-performance genetic algorithms and evolutionary optimization.<n>It can be used to tackle a wide range of problems in high-energy physics and beyond, such as derivative-free parameter optimization.<n>evortran offers a variety of selection, crossover, mutation and elitism strategies, with which users can tailor an evolutionary algorithm to their specific needs.
arXiv Detail & Related papers (2025-07-08T15:22:22Z)
CheMatAgent: Enhancing LLMs for Chemistry and Materials Science through Tree-Search Based Tool Learning [12.745398618084474]
Large language models (LLMs) have recently demonstrated promising capabilities in chemistry tasks.<n>We propose an LLM-based agent that integrates 137 external chemical tools created ranging from basic information retrieval to complex reaction predictions.
arXiv Detail & Related papers (2025-06-09T08:41:39Z)
A new framework for X-ray absorption spectroscopy data analysis based on machine learning: XASDAML [3.26781102547109]
XASDAML is a flexible, machine learning based framework that integrates the entire data-processing workflow. It supports comprehensive statistical analysis, leveraging methods such as principal component analysis and clustering. The versatility and effectiveness of XASDAML are exemplified by its application to a copper dataset.
arXiv Detail & Related papers (2025-02-23T17:50:04Z)
GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters. Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks. It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z)
Large-Scale Targeted Cause Discovery with Data-Driven Learning [66.86881771339145]
We propose a novel machine learning approach for inferring causal variables of a target variable from observations. By employing a local-inference strategy, our approach scales with linear complexity in the number of variables, efficiently scaling up to thousands of variables. Empirical results demonstrate superior performance in identifying causal relationships within large-scale gene regulatory networks.
arXiv Detail & Related papers (2024-08-29T02:21:11Z)
Exploring the Feasibility of Automated Data Standardization using Large Language Models for Seamless Positioning [10.200170217746136]
We propose a feasibility study for real-time automated data standardization leveraging Large Language Models (LLMs) Our study ensures data compatibility and improves positioning accuracy using the Extended Kalman Filter (EKF) This study underscores the potential of advanced LLMs in overcoming sensor data integration complexities.
arXiv Detail & Related papers (2024-08-22T02:40:21Z)
CellAgent: An LLM-driven Multi-Agent Framework for Automated Single-cell Data Analysis [35.61361183175167]
Single-cell RNA sequencing (scRNA-seq) data analysis is crucial for biological research. However, manual manipulation of various tools to achieve desired outcomes can be labor-intensive for researchers. We introduce CellAgent, an LLM-driven multi-agent framework for the automatic processing and execution of scRNA-seq data analysis tasks.
arXiv Detail & Related papers (2024-07-13T09:14:50Z)
SeqMate: A Novel Large Language Model Pipeline for Automating RNA Sequencing [0.0]
SeqMate is a tool that allows for one-click analytics by utilizing the power of a large language model (LLM) to automate both data preparation and analysis. By utilizing the power of generative AI, SeqMate is also capable of analyzing such findings and producing written reports of upregulated/downregulated/user-prompted genes.
arXiv Detail & Related papers (2024-07-02T20:28:30Z)
UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models [88.16197692794707]
UniGen is a comprehensive framework designed to produce diverse, accurate, and highly controllable datasets. To augment data diversity, UniGen incorporates an attribute-guided generation module and a group checking feature. Extensive experiments demonstrate the superior quality of data generated by UniGen.
arXiv Detail & Related papers (2024-06-27T07:56:44Z)
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research [70.6584488911715]
retrieval-augmented generation (RAG) has attracted considerable research attention. Existing RAG toolkits are often heavy and inflexibly, failing to meet the customization needs of researchers. Our toolkit has implemented 16 advanced RAG methods and gathered and organized 38 benchmark datasets.
arXiv Detail & Related papers (2024-05-22T12:12:40Z)
CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments [51.41735920759667]
Large Language Models (LLMs) have shown promise in various tasks, but they often lack specific knowledge and struggle to accurately solve biological design problems. In this work, we introduce CRISPR-GPT, an LLM agent augmented with domain knowledge and external tools to automate and enhance the design process of CRISPR-based gene-editing experiments.
arXiv Detail & Related papers (2024-04-27T22:59:17Z)
RigLSTM: Recurrent Independent Grid LSTM for Generalizable Sequence Learning [75.61681328968714]
We propose recurrent independent Grid LSTM (RigLSTM) to exploit the underlying modular structure of the target task. Our model adopts cell selection, input feature selection, hidden state selection, and soft state updating to achieve a better generalization ability.
arXiv Detail & Related papers (2023-11-03T07:40:06Z)
TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations. We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z)
Synthetic Data Generator for Adaptive Interventions in Global Health [0.0]
We introduce HealthSyn, an open-source synthetic data generator of user behavior for testing reinforcement learning algorithms. HealthSyn generates diverse user actions, with individual user behavioral patterns that can change in reaction to personalized interventions. The generated data can be used to develop, test, and evaluate, both ML algorithms in research and end-to-end operational RL-based intervention delivery frameworks.
arXiv Detail & Related papers (2023-03-03T14:28:45Z)
Multi-modal Self-supervised Pre-training for Regulatory Genome Across Cell Types [75.65676405302105]
We propose a simple yet effective approach for pre-training genome data in a multi-modal and self-supervised manner, which we call GeneBERT. We pre-train our model on the ATAC-seq dataset with 17 million genome sequences.
arXiv Detail & Related papers (2021-10-11T12:48:44Z)
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
PHOTONAI -- A Python API for Rapid Machine Learning Model Development [2.414341608751139]
PHOTONAI is a high-level Python API designed to simplify and accelerate machine learning model development. It functions as a unifying framework allowing the user to easily access and combine algorithms from different toolboxes into custom algorithm sequences.
arXiv Detail & Related papers (2020-02-13T10:33:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.