Self-omics: A Self-supervised Learning Framework for Multi-omics Cancer
Data
- URL: http://arxiv.org/abs/2210.00825v1
- Date: Mon, 3 Oct 2022 11:20:12 GMT
- Title: Self-omics: A Self-supervised Learning Framework for Multi-omics Cancer
Data
- Authors: Sayed Hashim, Karthik Nandakumar, Mohammad Yaqub
- Abstract summary: Self-Supervised Learning (SSL) methods are typically used to deal with limited labelled data.
We develop a novel pre-training paradigm that consists of various SSL components.
Our approach outperforms the state-of-the-art method in cancer type classification on the TCGA pan-cancer dataset.
- Score: 4.843654097048771
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We have gained access to vast amounts of multi-omics data thanks to Next
Generation Sequencing. However, it is challenging to analyse this data due to
its high dimensionality and much of it not being annotated. Lack of annotated
data is a significant problem in machine learning, and Self-Supervised Learning
(SSL) methods are typically used to deal with limited labelled data. However,
there is a lack of studies that use SSL methods to exploit inter-omics
relationships on unlabelled multi-omics data. In this work, we develop a novel
and efficient pre-training paradigm that consists of various SSL components,
including but not limited to contrastive alignment, data recovery from
corrupted samples, and using one type of omics data to recover other omic
types. Our pre-training paradigm improves performance on downstream tasks with
limited labelled data. We show that our approach outperforms the
state-of-the-art method in cancer type classification on the TCGA pan-cancer
dataset in semi-supervised setting. Moreover, we show that the encoders that
are pre-trained using our approach can be used as powerful feature extractors
even without fine-tuning. Our ablation study shows that the method is not
overly dependent on any pretext task component. The network architectures in
our approach are designed to handle missing omic types and multiple datasets
for pre-training and downstream training. Our pre-training paradigm can be
extended to perform zero-shot classification of rare cancers.
Related papers
- Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks [10.932880269282014]
We propose the first effective DD method for SSL pre-training.
Specifically, we train a small student model to match the representations of a larger teacher model trained with SSL.
As the KD objective has considerably lower variance than SSL, our approach can generate synthetic datasets that can successfully pre-train high-quality encoders.
arXiv Detail & Related papers (2024-10-03T00:39:25Z) - A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification [51.35500308126506]
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels.
We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types.
arXiv Detail & Related papers (2024-07-16T23:17:36Z) - Semi-Supervised End-To-End Contrastive Learning For Time Series
Classification [10.635321868623883]
Time series classification is a critical task in various domains, such as finance, healthcare, and sensor data analysis.
We propose an end-to-end model called SLOTS (Semi-supervised Learning fOr Time clasSification)
arXiv Detail & Related papers (2023-10-13T04:22:21Z) - Synthetic Augmentation with Large-scale Unconditional Pre-training [4.162192894410251]
We propose a synthetic augmentation method called HistoDiffusion to reduce the dependency on annotated data.
HistoDiffusion can be pre-trained on large-scale unlabeled datasets and later applied to a small-scale labeled dataset for augmented training.
We evaluate our proposed method by pre-training on three histopathology datasets and testing on a histopathology dataset of colorectal cancer (CRC) excluded from the pre-training datasets.
arXiv Detail & Related papers (2023-08-08T03:34:04Z) - A semi-supervised Teacher-Student framework for surgical tool detection
and localization [2.41710192205034]
We introduce a semi-supervised learning (SSL) framework in surgical tool detection paradigm.
In the proposed work, we train a model with labeled data which initialises the Teacher-Student joint learning.
Our results on m2cai16-tool-locations dataset indicate the superiority of our approach on different supervised data settings.
arXiv Detail & Related papers (2022-08-21T17:21:31Z) - Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV)
NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones.
We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z) - ADT-SSL: Adaptive Dual-Threshold for Semi-Supervised Learning [68.53717108812297]
Semi-Supervised Learning (SSL) has advanced classification tasks by inputting both labeled and unlabeled data to train a model jointly.
This paper proposes an Adaptive Dual-Threshold method for Semi-Supervised Learning (ADT-SSL)
Experimental results show that the proposed ADT-SSL achieves state-of-the-art classification accuracy.
arXiv Detail & Related papers (2022-05-21T11:52:08Z) - ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for
Semi-supervised Continual Learning [52.831894583501395]
Continual learning assumes the incoming data are fully labeled, which might not be applicable in real applications.
We propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN)
We show ORDisCo achieves significant performance improvement on various semi-supervised learning benchmark datasets for SSCL.
arXiv Detail & Related papers (2021-01-02T09:04:14Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.