Active Learning to Classify Macromolecular Structures in situ for Less
Supervision in Cryo-Electron Tomography
- URL: http://arxiv.org/abs/2102.12040v1
- Date: Wed, 24 Feb 2021 03:10:32 GMT
- Title: Active Learning to Classify Macromolecular Structures in situ for Less
Supervision in Cryo-Electron Tomography
- Authors: Xuefeng Du, Haohan Wang, Zhenxi Zhu, Xiangrui Zeng, Yi-Wei Chang, Jing
Zhang, Eric Xing, Min Xu
- Abstract summary: We propose a framework for querying subtomograms for labelling from a large unlabeled subtomogram pool.
HAL adopts uncertainty sampling to select the subtomograms that have the most uncertain predictions.
HAL introduces a subset sampling strategy to improve the diversity of the query set.
- Score: 18.97783153971551
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motivation: Cryo-Electron Tomography (cryo-ET) is a 3D bioimaging tool that
visualizes the structural and spatial organization of macromolecules at a
near-native state in single cells, which has broad applications in life
science. However, the systematic structural recognition and recovery of
macromolecules captured by cryo-ET are difficult due to high structural
complexity and imaging limits. Deep learning based subtomogram classification
have played critical roles for such tasks. As supervised approaches, however,
their performance relies on sufficient and laborious annotation on a large
training dataset.
Results: To alleviate this major labeling burden, we proposed a Hybrid Active
Learning (HAL) framework for querying subtomograms for labelling from a large
unlabeled subtomogram pool. Firstly, HAL adopts uncertainty sampling to select
the subtomograms that have the most uncertain predictions. Moreover, to
mitigate the sampling bias caused by such strategy, a discriminator is
introduced to judge if a certain subtomogram is labeled or unlabeled and
subsequently the model queries the subtomogram that have higher probabilities
to be unlabeled. Additionally, HAL introduces a subset sampling strategy to
improve the diversity of the query set, so that the information overlap is
decreased between the queried batches and the algorithmic efficiency is
improved. Our experiments on subtomogram classification tasks using both
simulated and real data demonstrate that we can achieve comparable testing
performance (on average only 3% accuracy drop) by using less than 30% of the
labeled subtomograms, which shows a very promising result for subtomogram
classification task with limited labeling resources.
Related papers
- Unraveling the Impact of Heterophilic Structures on Graph Positive-Unlabeled Learning [71.9954600831939]
Positive-Unlabeled (PU) learning is vital in many real-world scenarios, but its application to graph data remains under-explored.
We unveil that a critical challenge for PU learning on graph lies on the edge heterophily, which directly violates the irreducibility assumption for Class-Prior Estimation.
In response to this challenge, we introduce a new method, named Graph PU Learning with Label Propagation Loss (GPL)
arXiv Detail & Related papers (2024-05-30T10:30:44Z) - MyriadAL: Active Few Shot Learning for Histopathology [10.652626309100889]
We introduce an active few shot learning framework, Myriad Active Learning (MAL)
MAL includes a contrastive-learning encoder, pseudo-label generation, and novel query sample selection in the loop.
Experiments on two public histopathology datasets show that MAL has superior test accuracy, macro F1-score, and label efficiency compared to prior works.
arXiv Detail & Related papers (2023-10-24T20:08:15Z) - Synthetic Augmentation with Large-scale Unconditional Pre-training [4.162192894410251]
We propose a synthetic augmentation method called HistoDiffusion to reduce the dependency on annotated data.
HistoDiffusion can be pre-trained on large-scale unlabeled datasets and later applied to a small-scale labeled dataset for augmented training.
We evaluate our proposed method by pre-training on three histopathology datasets and testing on a histopathology dataset of colorectal cancer (CRC) excluded from the pre-training datasets.
arXiv Detail & Related papers (2023-08-08T03:34:04Z) - Rethinking Semi-Supervised Medical Image Segmentation: A
Variance-Reduction Perspective [51.70661197256033]
We propose ARCO, a semi-supervised contrastive learning framework with stratified group theory for medical image segmentation.
We first propose building ARCO through the concept of variance-reduced estimation and show that certain variance-reduction techniques are particularly beneficial in pixel/voxel-level segmentation tasks.
We experimentally validate our approaches on eight benchmarks, i.e., five 2D/3D medical and three semantic segmentation datasets, with different label settings.
arXiv Detail & Related papers (2023-02-03T13:50:25Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Hierarchical Semi-Supervised Contrastive Learning for
Contamination-Resistant Anomaly Detection [81.07346419422605]
Anomaly detection aims at identifying deviant samples from the normal data distribution.
Contrastive learning has provided a successful way to sample representation that enables effective discrimination on anomalies.
We propose a novel hierarchical semi-supervised contrastive learning framework, for contamination-resistant anomaly detection.
arXiv Detail & Related papers (2022-07-24T18:49:26Z) - Cryo-shift: Reducing domain shift in cryo-electron subtomograms with
unsupervised domain adaptation and randomization [17.921052986098946]
Subtomogram classification and recognition constitute a primary step in the systematic recovery of macromolecular structures.
Supervised deep learning methods have been proven to be highly accurate and efficient for subtomogram classification.
We present Cryo-Shift, a fully unsupervised domain adaptation and randomization framework for deep learning-based cross-domain subtomogram classification.
arXiv Detail & Related papers (2021-11-17T13:43:36Z) - Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.
Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples.
We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z) - Few shot domain adaptation for in situ macromolecule structural
classification in cryo-electron tomograms [13.51208578647949]
We adapt a few shot domain adaptation method for deep learning based cross-domain subtomogram classification.
Our method achieves significant improvement on cross domain subtomogram classification compared with baseline methods.
arXiv Detail & Related papers (2020-07-30T12:39:21Z) - Deep Semi-supervised Knowledge Distillation for Overlapping Cervical
Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation.
We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining.
Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z) - Exemplar Auditing for Multi-Label Biomedical Text Classification [0.4873362301533824]
We generalize a recently proposed zero-shot sequence labeling method, "supervised labeling via a convolutional decomposition"
The approach yields classification with "introspection", relating the fine-grained features of an inference-time prediction to their nearest neighbors.
Our proposed approach yields both a competitively effective classification model and an interrogation mechanism to aid healthcare workers in understanding the salient features that drive the model's predictions.
arXiv Detail & Related papers (2020-04-07T02:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.