Feature extraction using Spectral Clustering for Gene Function
Prediction
- URL: http://arxiv.org/abs/2203.13551v1
- Date: Fri, 25 Mar 2022 10:17:36 GMT
- Title: Feature extraction using Spectral Clustering for Gene Function
Prediction
- Authors: Miguel Romero, Oscar Ram\'irez, Jorge Finke, Camilo Rocha
- Abstract summary: This paper presents a novel in silico approach for to the annotation problem that combines cluster analysis and hierarchical multi-label classification.
The proposed approach is applied to a case study on Zea mays, one of the most dominant and productive crops in the world.
- Score: 0.4492444446637856
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gene annotation addresses the problem of predicting unknown associations
between gene and functions (e.g., biological processes) of a specific organism.
Despite recent advances, the cost and time demanded by annotation procedures
that rely largely on in vivo biological experiments remain prohibitively high.
This paper presents a novel in silico approach for to the annotation problem
that combines cluster analysis and hierarchical multi-label classification
(HMC). The approach uses spectral clustering to extract new features from the
gene co-expression network (GCN) and enrich the prediction task. HMC is used to
build multiple estimators that consider the hierarchical structure of gene
functions. The proposed approach is applied to a case study on Zea mays, one of
the most dominant and productive crops in the world. The results illustrate how
in silico approaches are key to reduce the time and costs of gene annotation.
More specifically, they highlight the importance of: (i) building new features
that represent the structure of gene relationships in GCNs to annotate genes;
and (ii) taking into account the structure of biological processes to obtain
consistent predictions.
Related papers
- Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling [60.91599380893732]
VQDNA is a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning.
By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings.
arXiv Detail & Related papers (2024-05-13T20:15:03Z) - FGBERT: Function-Driven Pre-trained Gene Language Model for Metagenomics [35.47381119898764]
We introduce a protein-based gene representation as a context-aware and structure-relevant tokenizer.
MGM and TEM-CL constitute our novel metagenomic language model NAME, pre-trained on 100 million metagenomic sequences.
arXiv Detail & Related papers (2024-02-24T13:13:17Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Genetic prediction of quantitative traits: a machine learner's guide
focused on height [0.0]
We provide an overview for the machine learning community on current state of the art models and associated subtleties.
We use height as an example of a continuous-valued phenotype and provide an introduction to benchmark datasets, confounders, feature selection, and common metrics.
arXiv Detail & Related papers (2023-10-06T05:43:50Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - Granger causal inference on DAGs identifies genomic loci regulating
transcription [77.58911272503771]
GrID-Net is a framework based on graph neural networks with lagged message passing for Granger causal inference on DAG-structured systems.
Our application is the analysis of single-cell multimodal data to identify genomic loci that mediate the regulation of specific genes.
arXiv Detail & Related papers (2022-10-18T21:15:10Z) - Hierarchy exploitation to detect missing annotations on hierarchical
multi-label classification [0.1749935196721634]
We present a method to detect missing annotations in hierarchical multi-label classification datasets.
We propose a method that exploits the class hierarchy by computing aggregated probabilities to the paths of classes from the leaves to the root for each instance.
The experiments on Oriza sativa Japonica, a variety of rice, showcase that incorporating the hierarchy of classes into the method often improves the predictive performance.
arXiv Detail & Related papers (2022-07-13T14:32:50Z) - SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features
Learning from a Language Model [3.0643865202019698]
We propose a new solution named SemanticCAP to identify accessible regions of the genome.
It introduces a gene language model which models the context of gene sequences, thus being able to provide an effective representation of gene sequences.
Compared with other systems under public benchmarks, our model proved to have better performance.
arXiv Detail & Related papers (2022-04-05T11:47:58Z) - Mining Functionally Related Genes with Semi-Supervised Learning [0.0]
We introduce a rich set of features and use them in conjunction with semisupervised learning approaches.
The framework of learning with positive and unlabeled examples (LPU) is shown to be especially appropriate for mining functionally related genes.
arXiv Detail & Related papers (2020-11-05T20:34:09Z) - Complexity-based speciation and genotype representation for
neuroevolution [81.21462458089142]
This paper introduces a speciation principle for neuroevolution where evolving networks are grouped into species based on the number of hidden neurons.
The proposed speciation principle is employed in several techniques designed to promote and preserve diversity within species and in the ecosystem as a whole.
arXiv Detail & Related papers (2020-10-11T06:26:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.