A Novel Fuzzy Bi-Clustering Algorithm with AFS for Identification of
Co-Regulated Genes
- URL: http://arxiv.org/abs/2302.01596v1
- Date: Fri, 3 Feb 2023 08:35:49 GMT
- Title: A Novel Fuzzy Bi-Clustering Algorithm with AFS for Identification of
Co-Regulated Genes
- Authors: Kaijie Xu
- Abstract summary: This paper proposes a novel fuzzy bi-clustering algorithm for identification of co-regulated genes.
The proposed algorithm can effectively detect the co-regulated genes without any prior knowledge of the gene expression data.
- Score: 0.799536002595393
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The identification of co-regulated genes and their transcription-factor
binding sites (TFBS) are the key steps toward understanding transcription
regulation. In addition to effective laboratory assays, various bi-clustering
algorithms for detection of the co-expressed genes have been developed.
Bi-clustering methods are used to discover subgroups of genes with similar
expression patterns under to-be-identified subsets of experimental conditions
when applied to gene expression data. By building two fuzzy partition matrices
of the gene expression data with the Axiomatic Fuzzy Set (AFS) theory, this
paper proposes a novel fuzzy bi-clustering algorithm for identification of
co-regulated genes. Specifically, the gene expression data is transformed into
two fuzzy partition matrices via sub-preference relations theory of AFS at
first. One of the matrices is considering the genes as the universe and the
conditions as the concept, the other one is considering the genes as the
concept and the conditions as the universe. The identification of the
co-regulated genes (bi-clusters) is carried out on the two partition matrices
at the same time. Then, a novel fuzzy-based similarity criterion is defined
based on the partition matrixes, and a cyclic optimization algorithm is
designed to discover the significant bi-clusters at expression level. The above
procedures guarantee that the generated bi-clusters have more significant
expression values than that of extracted by the traditional bi-clustering
methods. Finally, the performance of the proposed method is evaluated with the
performance of the three well-known bi-clustering algorithms on publicly
available real microarray datasets. The experimental results are in agreement
with the theoretical analysis and show that the proposed algorithm can
effectively detect the co-regulated genes without any prior knowledge of the
gene expression data.
Related papers
- Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - Feature Selection via Robust Weighted Score for High Dimensional Binary
Class-Imbalanced Gene Expression Data [1.2891210250935148]
A robust weighted score for unbalanced data (ROWSU) is proposed for selecting the most discriminative feature for high dimensional gene expression binary classification with class-imbalance problem.
The performance of the proposed ROWSU method is evaluated on $6$ gene expression datasets.
arXiv Detail & Related papers (2024-01-23T11:22:03Z) - Accelerated Discovery of Machine-Learned Symmetries: Deriving the
Exceptional Lie Groups G2, F4 and E6 [55.41644538483948]
This letter introduces two improved algorithms that significantly speed up the discovery of symmetry transformations.
Given the significant complexity of the exceptional Lie groups, our results demonstrate that this machine-learning method for discovering symmetries is completely general and can be applied to a wide variety of labeled datasets.
arXiv Detail & Related papers (2023-07-10T20:25:44Z) - DynGFN: Towards Bayesian Inference of Gene Regulatory Networks with
GFlowNets [81.75973217676986]
Gene regulatory networks (GRN) describe interactions between genes and their products that control gene expression and cellular function.
Existing methods either focus on challenge (1), identifying cyclic structure from dynamics, or on challenge (2) learning complex Bayesian posteriors over DAGs, but not both.
In this paper we leverage the fact that it is possible to estimate the "velocity" of gene expression with RNA velocity techniques to develop an approach that addresses both challenges.
arXiv Detail & Related papers (2023-02-08T16:36:40Z) - Multiscale methods for signal selection in single-cell data [2.683475550237718]
We propose three topologically-motivated mathematical methods for unsupervised feature selection.
We demonstrate the utility of these techniques by applying them to published single-cell transcriptomics data sets.
arXiv Detail & Related papers (2022-06-15T18:42:26Z) - Hybrid gene selection approach using XGBoost and multi-objective genetic
algorithm for cancer classification [6.781877756322586]
We propose a two-stage gene selection approach by combining extreme gradient boosting (XGBoost) and a multi-objective optimization genetic algorithm (XGBoost-MOGA) for cancer classification in microarray datasets.
XGBoost-MOGA yields significantly better results than previous state-of-the-art algorithms in terms of various evaluation criteria, such as accuracy, F-score, precision, and recall.
arXiv Detail & Related papers (2021-05-30T03:43:22Z) - Object-Attribute Biclustering for Elimination of Missing Genotypes in
Ischemic Stroke Genome-Wide Data [2.0236506875465863]
Missing genotypes can affect the efficacy of machine learning approaches to identify the risk genetic variants of common diseases and traits.
The problem occurs when genotypic data are collected from different experiments with different DNA microarrays, each being characterised by its pattern of uncalled (missing) genotypes.
We use well-developed notions of object-attribute biclusters and formal concepts that correspond to dense subrelations in the binary relation.
arXiv Detail & Related papers (2020-10-22T12:27:43Z) - A Novel Granular-Based Bi-Clustering Method of Deep Mining the
Co-Expressed Genes [76.84066556597342]
Bi-clustering methods are used to mine bi-clusters whose subsets of samples (genes) are co-regulated under their test conditions.
Unfortunately, traditional bi-clustering methods are not fully effective in discovering such bi-clusters.
We propose a novel bi-clustering method by involving here the theory of Granular Computing.
arXiv Detail & Related papers (2020-05-12T02:04:40Z) - Conjoined Dirichlet Process [63.89763375457853]
We develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns.
We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches.
arXiv Detail & Related papers (2020-02-08T19:41:23Z) - Clustering Binary Data by Application of Combinatorial Optimization
Heuristics [52.77024349608834]
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters.
Five new and original methods are introduced, using neighborhoods and population behavior optimization metaheuristics.
From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
arXiv Detail & Related papers (2020-01-06T23:33:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.