Label-Assemble: Leveraging Multiple Datasets with Partial Labels
- URL: http://arxiv.org/abs/2109.12265v4
- Date: Sun, 14 May 2023 14:53:07 GMT
- Title: Label-Assemble: Leveraging Multiple Datasets with Partial Labels
- Authors: Mintong Kang, Bowen Li, Zengle Zhu, Yongyi Lu, Elliot K. Fishman, Alan
L. Yuille, Zongwei Zhou
- Abstract summary: "Label-Assemble" aims to unleash the full potential of partial labels from an assembly of public datasets.
We discovered that learning from negative examples facilitates both computer-aided disease diagnosis and detection.
- Score: 68.46767639240564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of deep learning relies heavily on large labeled datasets, but we
often only have access to several small datasets associated with partial
labels. To address this problem, we propose a new initiative, "Label-Assemble",
that aims to unleash the full potential of partial labels from an assembly of
public datasets. We discovered that learning from negative examples facilitates
both computer-aided disease diagnosis and detection. This discovery will be
particularly crucial in novel disease diagnosis, where positive examples are
hard to collect, yet negative examples are relatively easier to assemble. For
example, assembling existing labels from NIH ChestX-ray14 (available since
2017) significantly improves the accuracy of COVID-19 diagnosis from 96.3% to
99.3%. In addition to diagnosis, assembling labels can also improve disease
detection, e.g., the detection of pancreatic ductal adenocarcinoma (PDAC) can
greatly benefit from leveraging the labels of Cysts and PanNets (two other
types of pancreatic abnormalities), increasing sensitivity from 52.1% to 84.0%
while maintaining a high specificity of 98.0%.
Related papers
- Weakly-supervised diagnosis identification from Italian discharge letters [0.0]
We propose a novel weakly-supervised pipeline to recognize diseases from Italian discharge letters.
Our pipeline is based on a fine-tuned version of the Italian Umberto model.
arXiv Detail & Related papers (2024-10-19T09:42:20Z) - Automating Weak Label Generation for Data Programming with Clinicians in the Loop [5.729255216041754]
We propose an algorithm that queries an expert for labels of a few representative samples of the dataset.
The labels assigned by the expert induce a labeling on the full dataset, thereby generating weak labels to be used in the data programming pipeline.
In our medical time series case study, labeling a subset of 50 to 130 out of 3,265 samples showed 17-28% improvement in accuracy and 13-28% improvement in F1 over the baseline.
arXiv Detail & Related papers (2024-07-10T18:29:22Z) - You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling [60.27812493442062]
We show the importance of investigating labeled data quality to improve any pseudo-labeling method.
Specifically, we introduce a novel data characterization and selection framework called DIPS to extend pseudo-labeling.
We demonstrate the applicability and impact of DIPS for various pseudo-labeling methods across an extensive range of real-world datasets.
arXiv Detail & Related papers (2024-06-19T17:58:40Z) - PatchSorter: A High Throughput Deep Learning Digital Pathology Tool for
Object Labeling [0.8290040611295051]
We release an open-source labeling tool, PatchSorter, which integrates deep learning with an intuitive web interface.
We demonstrate a >7x improvement in labels per second over unaided labeling, with minimal impact on labeling accuracy.
arXiv Detail & Related papers (2023-07-13T09:32:42Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - WSSS4LUAD: Grand Challenge on Weakly-supervised Tissue Semantic
Segmentation for Lung Adenocarcinoma [51.50991881342181]
This challenge includes 10,091 patch-level annotations and over 130 million labeled pixels.
First place team achieved mIoU of 0.8413 (tumor: 0.8389, stroma: 0.7931, normal: 0.8919)
arXiv Detail & Related papers (2022-04-13T15:27:05Z) - MG-NET: Leveraging Pseudo-Imaging for Multi-Modal Metagenome Analysis [5.04905391284093]
We propose MG-Net, a self-supervised representation learning framework.
We show that MG-Net can learn robust representations from unlabeled data.
Experiments show that the learned features outperform current baseline metagenome representations.
arXiv Detail & Related papers (2021-07-21T05:53:01Z) - Boosting Semi-Supervised Face Recognition with Noise Robustness [54.342992887966616]
This paper presents an effective solution to semi-supervised face recognition that is robust to the label noise aroused by the auto-labelling.
We develop a semi-supervised face recognition solution, named Noise Robust Learning-Labelling (NRoLL), which is based on the robust training ability empowered by GN.
arXiv Detail & Related papers (2021-05-10T14:43:11Z) - Weakly-Supervised Cross-Domain Adaptation for Endoscopic Lesions
Segmentation [79.58311369297635]
We propose a new weakly-supervised lesions transfer framework, which can explore transferable domain-invariant knowledge across different datasets.
A Wasserstein quantified transferability framework is developed to highlight widerange transferable contextual dependencies.
A novel self-supervised pseudo label generator is designed to equally provide confident pseudo pixel labels for both hard-to-transfer and easy-to-transfer target samples.
arXiv Detail & Related papers (2020-12-08T02:26:03Z) - A Novel Semi-Supervised Data-Driven Method for Chiller Fault Diagnosis
with Unlabeled Data [9.357969752339727]
We propose a novel semi-supervised data-driven fault diagnosis method for chiller systems based on the semi-generative adversarial network.
The proposed method can improve the diagnostic accuracy to 84%, while the supervised baseline methods only reach the accuracy of 65% at most.
arXiv Detail & Related papers (2020-10-31T04:57:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.