evolSOM: an R Package for evolutionary conservation analysis with SOMs
- URL: http://arxiv.org/abs/2402.07948v1
- Date: Fri, 9 Feb 2024 20:33:48 GMT
- Title: evolSOM: an R Package for evolutionary conservation analysis with SOMs
- Authors: Santiago Prochetto, Renata Reinheimer, Georgina Stegmayer
- Abstract summary: We introduce evolSOM, a novel R package that utilizes Self-Organizing Maps (SOMs) to explore and visualize the conservation of biological variables.
The package automatically calculates and graphically presents displacements, enabling efficient comparison and revealing conserved and displaced variables.
Illustratively, we employed evolSOM to study the displacement of genes and phenotypic traits, successfully identifying potential drivers of phenotypic differentiation in grass leaves.
- Score: 0.4972323953932129
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Motivation: Unraveling the connection between genes and traits is crucial for
solving many biological puzzles. Genes provide instructions for building
cellular machinery, directing the processes that sustain life. RNA molecules
and proteins, derived from these genetic instructions, play crucial roles in
shaping cell structures, influencing reactions, and guiding behavior. This
fundamental biological principle links genetic makeup to observable traits, but
integrating and extracting meaningful relationships from this complex,
multimodal data presents a significant challenge. Results: We introduce
evolSOM, a novel R package that utilizes Self-Organizing Maps (SOMs) to explore
and visualize the conservation of biological variables, easing the integration
of phenotypic and genotypic attributes. By constructing species-specific or
condition-specific SOMs that capture non-redundant patterns, evolSOM allows the
analysis of displacement of biological variables between species or conditions.
Variables displaced together suggest membership in the same regulatory network,
and the nature of the displacement may hold biological significance. The
package automatically calculates and graphically presents these displacements,
enabling efficient comparison and revealing conserved and displaced variables.
The package facilitates the integration of diverse phenotypic data types,
enabling the exploration of potential gene drivers underlying observed
phenotypic changes. Its user-friendly interface and visualization capabilities
enhance the accessibility of complex network analyses. Illustratively, we
employed evolSOM to study the displacement of genes and phenotypic traits,
successfully identifying potential drivers of phenotypic differentiation in
grass leaves. Availability: The package is open-source and is available at
https://github.com/sanprochetto/evolSOM.
Related papers
- Semantically Rich Local Dataset Generation for Explainable AI in Genomics [0.716879432974126]
Black box deep learning models trained on genomic sequences excel at predicting the outcomes of different gene regulatory mechanisms.
We propose using Genetic Programming to generate datasets by evolving perturbations in sequences that contribute to their semantic diversity.
arXiv Detail & Related papers (2024-07-03T10:31:30Z) - Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - Sparsity regularization via tree-structured environments for disentangled representations [4.824771782127179]
Causal representation learning could advance scientific understanding by enabling inference of latent variables such as pathway activation.
We develop methods for inferring latent variables from multiple related datasets (environments) and tasks.
We find that Tree-Based Regularization (TBR) minimizes both prediction error and regularizes closely related environments to learn similar predictors.
arXiv Detail & Related papers (2024-05-30T21:08:14Z) - Cognitive Evolutionary Learning to Select Feature Interactions for Recommender Systems [59.117526206317116]
We show that CELL can adaptively evolve into different models for different tasks and data.
Experiments on four real-world datasets demonstrate that CELL significantly outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-05-29T02:35:23Z) - Whole Genome Transformer for Gene Interaction Effects in Microbiome Habitat Specificity [3.972930262155919]
We propose a framework taking advantage of existing large models for gene vectorization to predict habitat specificity from entire microbial genome sequences.
We train and validate our approach on a large dataset of high quality microbiome genomes from different habitats.
arXiv Detail & Related papers (2024-05-09T09:34:51Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - Genetic InfoMax: Exploring Mutual Information Maximization in
High-Dimensional Imaging Genetics Studies [50.11449968854487]
Genome-wide association studies (GWAS) are used to identify relationships between genetic variations and specific traits.
Representation learning for imaging genetics is largely under-explored due to the unique challenges posed by GWAS.
We introduce a trans-modal learning framework Genetic InfoMax (GIM) to address the specific challenges of GWAS.
arXiv Detail & Related papers (2023-09-26T03:59:21Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features
Learning from a Language Model [3.0643865202019698]
We propose a new solution named SemanticCAP to identify accessible regions of the genome.
It introduces a gene language model which models the context of gene sequences, thus being able to provide an effective representation of gene sequences.
Compared with other systems under public benchmarks, our model proved to have better performance.
arXiv Detail & Related papers (2022-04-05T11:47:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.