Multiscale methods for signal selection in single-cell data
- URL: http://arxiv.org/abs/2206.07760v1
- Date: Wed, 15 Jun 2022 18:42:26 GMT
- Title: Multiscale methods for signal selection in single-cell data
- Authors: Renee S. Hoekzema, Lewis Marsh, Otto Sumray, Xin Lu, Helen M. Byrne,
Heather A. Harrington
- Abstract summary: We propose three topologically-motivated mathematical methods for unsupervised feature selection.
We demonstrate the utility of these techniques by applying them to published single-cell transcriptomics data sets.
- Score: 2.683475550237718
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Analysis of single-cell transcriptomics often relies on clustering cells and
then performing differential gene expression (DGE) to identify genes that vary
between these clusters. These discrete analyses successfully determine cell
types and markers; however, continuous variation within and between cell types
may not be detected. We propose three topologically-motivated mathematical
methods for unsupervised feature selection that consider discrete and
continuous transcriptional patterns on an equal footing across multiple scales
simultaneously. Eigenscores ($\mathrm{eig}_i$) rank signals or genes based on
their correspondence to low-frequency intrinsic patterning in the data using
the spectral decomposition of the graph Laplacian. The multiscale Laplacian
score (MLS) is an unsupervised method for locating relevant scales in data and
selecting the genes that are coherently expressed at these respective scales.
The persistent Rayleigh quotient (PRQ) takes data equipped with a filtration,
allowing separation of genes with different roles in a bifurcation process
(e.g. pseudo-time). We demonstrate the utility of these techniques by applying
them to published single-cell transcriptomics data sets. The methods validate
previously identified genes and detect additional genes with coherent
expression patterns. By studying the interaction between gene signals and the
geometry of the underlying space, the three methods give multidimensional
rankings of the genes and visualisation of relationships between them.
Related papers
- Robust Multi-view Co-expression Network Inference [8.697303234009528]
Inferring gene co-expression networks from transcriptome data presents many challenges.
We introduce a robust method for high-dimensional graph inference from multiple independent studies.
arXiv Detail & Related papers (2024-09-30T06:30:09Z) - Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - MuSe-GNN: Learning Unified Gene Representation From Multimodal
Biological Graph Data [22.938437500266847]
We introduce a novel model called Multimodal Similarity Learning Graph Neural Network.
It combines Multimodal Machine Learning and Deep Graph Neural Networks to learn gene representations from single-cell sequencing and spatial transcriptomic data.
Our model efficiently produces unified gene representations for the analysis of gene functions, tissue functions, diseases, and species evolution.
arXiv Detail & Related papers (2023-09-29T13:33:53Z) - Genetic heterogeneity analysis using genetic algorithm and network
science [2.6166087473624318]
Genome-wide association studies (GWAS) can identify disease susceptible genetic variables.
Genetic variables intertwined with genetic effects often exhibit lower effect-size.
This paper introduces a novel feature selection mechanism for GWAS, named Feature Co-selection Network (FCSNet)
arXiv Detail & Related papers (2023-08-12T01:28:26Z) - Graph Fourier MMD for Signals on Graphs [67.68356461123219]
We propose a novel distance between distributions and signals on graphs.
GFMMD is defined via an optimal witness function that is both smooth on the graph and maximizes difference in expectation.
We showcase it on graph benchmark datasets as well as on single cell RNA-sequencing data analysis.
arXiv Detail & Related papers (2023-06-05T00:01:17Z) - Using Signal Processing in Tandem With Adapted Mixture Models for
Classifying Genomic Signals [16.119729980200955]
We propose a novel technique that employs signal processing in tandem with Gaussian mixture models to improve the spectral representation of a sequence.
Our method outperforms a similar state-of-the-art method on established benchmark datasets by an absolute margin of 6.06% accuracy.
arXiv Detail & Related papers (2022-11-03T06:10:55Z) - Granger causal inference on DAGs identifies genomic loci regulating
transcription [77.58911272503771]
GrID-Net is a framework based on graph neural networks with lagged message passing for Granger causal inference on DAG-structured systems.
Our application is the analysis of single-cell multimodal data to identify genomic loci that mediate the regulation of specific genes.
arXiv Detail & Related papers (2022-10-18T21:15:10Z) - Multivariate feature ranking of gene expression data [62.997667081978825]
We propose two new multivariate feature ranking methods based on pairwise correlation and pairwise consistency.
We statistically prove that the proposed methods outperform the state of the art feature ranking methods Clustering Variation, Chi Squared, Correlation, Information Gain, ReliefF and Significance.
arXiv Detail & Related papers (2021-11-03T17:19:53Z) - High-dimensional multi-trait GWAS by reverse prediction of genotypes [3.441021278275805]
Reverse regression is a promising approach to perform multi-trait GWAS in high-dimensional settings.
We analyzed different machine learning methods for reverse regression in multi-trait GWAS.
Model feature coefficients correlated with the strength of association between variants and individual traits, and were predictive of true trans-eQTL target genes.
arXiv Detail & Related papers (2021-10-29T22:34:35Z) - A Novel Granular-Based Bi-Clustering Method of Deep Mining the
Co-Expressed Genes [76.84066556597342]
Bi-clustering methods are used to mine bi-clusters whose subsets of samples (genes) are co-regulated under their test conditions.
Unfortunately, traditional bi-clustering methods are not fully effective in discovering such bi-clusters.
We propose a novel bi-clustering method by involving here the theory of Granular Computing.
arXiv Detail & Related papers (2020-05-12T02:04:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.