Efficient Analysis of COVID-19 Clinical Data using Machine Learning
Models
- URL: http://arxiv.org/abs/2110.09606v1
- Date: Mon, 18 Oct 2021 20:06:01 GMT
- Title: Efficient Analysis of COVID-19 Clinical Data using Machine Learning
Models
- Authors: Sarwan Ali, Yijing Zhou, Murray Patterson
- Abstract summary: Huge volumes of data and case studies have been made available, providing researchers with a unique opportunity to find trends.
Applying machine learning based algorithms to this big data is a natural approach to take to this aim.
We show that with the efficient feature selection algorithm, we can achieve a prediction accuracy of more than 90% in most cases.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Because of the rapid spread of COVID-19 to almost every part of the globe,
huge volumes of data and case studies have been made available, providing
researchers with a unique opportunity to find trends and make discoveries like
never before, by leveraging such big data. This data is of many different
varieties, and can be of different levels of veracity e.g., precise, imprecise,
uncertain, and missing, making it challenging to extract important information
from such data. Yet, efficient analyses of this continuously growing and
evolving COVID-19 data is crucial to inform -- often in real-time -- the
relevant measures needed for controlling, mitigating, and ultimately avoiding
viral spread. Applying machine learning based algorithms to this big data is a
natural approach to take to this aim, since they can quickly scale to such
data, and extract the relevant information in the presence of variety and
different levels of veracity. This is important for COVID-19, and for potential
future pandemics in general.
In this paper, we design a straightforward encoding of clinical data (on
categorical attributes) into a fixed-length feature vector representation, and
then propose a model that first performs efficient feature selection from such
representation. We apply this approach on two clinical datasets of the COVID-19
patients and then apply different machine learning algorithms downstream for
classification purposes. We show that with the efficient feature selection
algorithm, we can achieve a prediction accuracy of more than 90\% in most
cases. We also computed the importance of different attributes in the dataset
using information gain. This can help the policy makers to focus on only
certain attributes for the purposes of studying this disease rather than
focusing on multiple random factors that may not be very informative to patient
outcomes.
Related papers
- Local-to-Global Self-Supervised Representation Learning for Diabetic Retinopathy Grading [0.0]
This research aims to present a novel hybrid learning model using self-supervised learning and knowledge distillation.
In our algorithm, for the first time among all self-supervised learning and knowledge distillation models, the test dataset is 50% larger than the training dataset.
Compared to a similar state-of-the-art model, our results achieved higher accuracy and more effective representation spaces.
arXiv Detail & Related papers (2024-10-01T15:19:16Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - ProtoKD: Learning from Extremely Scarce Data for Parasite Ova
Recognition [5.224806515926022]
We introduce ProtoKD, one of the first approaches to tackle the problem of multi-class parasitic ova recognition using extremely scarce data.
We establish a new benchmark to drive research in this critical direction and validate that the proposed ProtoKD framework achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-09-18T23:49:04Z) - Core-set Selection Using Metrics-based Explanations (CSUME) for
multiclass ECG [2.0520503083305073]
We show how a selection of good quality data improves deep learning model performance.
Our experimental results show a 9.67% and 8.69% precision and recall improvement with a significant training data volume reduction of 50%.
arXiv Detail & Related papers (2022-05-28T19:36:28Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Classifying COVID-19 Spike Sequences from Geographic Location Using Deep
Learning [0.0]
We propose an algorithm that first computes a numerical representation of the spike protein sequence of SARS-CoV-2 using $k$-merss.
We also show the importance of different amino acids in the spike sequences by computing the information gain corresponding to the true class labels.
arXiv Detail & Related papers (2021-10-02T14:09:30Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Unsupervised Pre-trained Models from Healthy ADLs Improve Parkinson's
Disease Classification of Gait Patterns [3.5939555573102857]
We show how to extract features relevant to accelerometer gait data for Parkinson's disease classification.
Our pre-trained source model consists of a convolutional autoencoder, and the target classification model is a simple multi-layer perceptron model.
We explore two different pre-trained source models, trained using different activity groups, and analyze the influence the choice of pre-trained model has over the task of Parkinson's disease classification.
arXiv Detail & Related papers (2020-05-06T04:08:19Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.