Comprehensive survey of computational learning methods for analysis of
gene expression data in genomics
- URL: http://arxiv.org/abs/2202.02958v2
- Date: Wed, 9 Feb 2022 15:51:09 GMT
- Title: Comprehensive survey of computational learning methods for analysis of
gene expression data in genomics
- Authors: Nikita Bhandari, Rahee Walambe, Ketan Kotecha, Satyajeet Khare
- Abstract summary: Computational analysis methods including machine learning have a significant impact in the fields of genomics and medicine.
In this review, we compile various statistical and computational tools used in analysis of expression microarray data.
We specifically discuss methods for missing value (gene expression) imputation, feature gene scaling, selection and extraction of features for dimensionality reduction, and learning and analysis of expression data.
- Score: 7.717214217542406
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computational analysis methods including machine learning have a significant
impact in the fields of genomics and medicine. High-throughput gene expression
analysis methods such as microarray technology and RNA sequencing produce
enormous amounts of data. Traditionally, statistical methods are used for
comparative analysis of the gene expression data. However, more complex
analysis for classification and discovery of feature genes or sample
observations requires sophisticated computational approaches. In this review,
we compile various statistical and computational tools used in analysis of
expression microarray data. Even though, the methods are discussed in the
context of expression microarray data, they can also be applied for the
analysis of RNA sequencing or quantitative proteomics datasets. We specifically
discuss methods for missing value (gene expression) imputation, feature gene
scaling, selection and extraction of features for dimensionality reduction, and
learning and analysis of expression data. We discuss the types of missing
values and the methods and approaches usually employed in their imputation. We
also discuss methods of data transformation and feature scaling viz.
normalization and standardization. Various approaches used in feature selection
and extraction are also reviewed. Lastly, learning and analysis methods
including class comparison, class prediction, and class discovery along with
their evaluation parameters are described in detail. We have described the
process of generation of a microarray gene expression data along with
advantages and limitations of the above-mentioned techniques. We believe that
this detailed review will help the users to select appropriate methods based on
the type of data and the expected outcome.
Related papers
- Robust Multi-view Co-expression Network Inference [8.697303234009528]
Inferring gene co-expression networks from transcriptome data presents many challenges.
We introduce a robust method for high-dimensional graph inference from multiple independent studies.
arXiv Detail & Related papers (2024-09-30T06:30:09Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - Machine Learning Methods for Cancer Classification Using Gene Expression
Data: A Review [77.34726150561087]
Cancer is the second major cause of death after cardiovascular diseases.
Gene expression can play a fundamental role in the early detection of cancer.
This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods.
arXiv Detail & Related papers (2023-01-28T15:03:03Z) - RandomSCM: interpretable ensembles of sparse classifiers tailored for
omics data [59.4141628321618]
We propose an ensemble learning algorithm based on conjunctions or disjunctions of decision rules.
The interpretability of the models makes them useful for biomarker discovery and patterns discovery in high dimensional data.
arXiv Detail & Related papers (2022-08-11T13:55:04Z) - Natural language processing for clusterization of genes according to
their functions [62.997667081978825]
We propose an approach that reduces the analysis of several thousand genes to analysis of several clusters.
The descriptions are encoded as vectors using the pretrained language model (BERT) and some text processing approaches.
arXiv Detail & Related papers (2022-07-17T12:59:34Z) - Using ontology embeddings for structural inductive bias in gene
expression data analysis [6.587739898387445]
Stratifying cancer patients based on their gene expression levels allows improving diagnosis, survival analysis and treatment planning.
We propose to incorporate biological knowledge about genes into the machine learning system for the task of patient classification given their gene expression data.
arXiv Detail & Related papers (2020-11-22T12:13:29Z) - Mining Functionally Related Genes with Semi-Supervised Learning [0.0]
We introduce a rich set of features and use them in conjunction with semisupervised learning approaches.
The framework of learning with positive and unlabeled examples (LPU) is shown to be especially appropriate for mining functionally related genes.
arXiv Detail & Related papers (2020-11-05T20:34:09Z) - Generalized Matrix Factorization: efficient algorithms for fitting
generalized linear latent variable models to large data arrays [62.997667081978825]
Generalized Linear Latent Variable models (GLLVMs) generalize such factor models to non-Gaussian responses.
Current algorithms for estimating model parameters in GLLVMs require intensive computation and do not scale to large datasets.
We propose a new approach for fitting GLLVMs to high-dimensional datasets, based on approximating the model using penalized quasi-likelihood.
arXiv Detail & Related papers (2020-10-06T04:28:19Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - A generalised OMP algorithm for feature selection with application to
gene expression data [1.969028842568933]
To apply to molecular data, feature selection algorithms need to be scalable to tens of thousands of available features.
We propose gOMP, a highly-scalable generalisation of the Orthogonal Matching Pursuit feature selection algorithm.
arXiv Detail & Related papers (2020-04-01T08:33:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.