Optimizing the Procedure of CT Segmentation Labeling
- URL: http://arxiv.org/abs/2303.14089v1
- Date: Fri, 24 Mar 2023 15:52:42 GMT
- Title: Optimizing the Procedure of CT Segmentation Labeling
- Authors: Yaroslav Zharov, Tilo Baumbach, Vincent Heuveline
- Abstract summary: In Computed Tomography, machine learning is often used for automated data processing.
We consider the annotation procedure and its effect on the model performance.
We assume three main virtues of a good dataset collected for a model training to be label quality, diversity, and completeness.
- Score: 1.2891210250935146
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In Computed Tomography, machine learning is often used for automated data
processing. However, increasing model complexity is accompanied by increasingly
large volume datasets, which in turn increases the cost of model training.
Unlike most work that mitigates this by advancing model architectures and
training algorithms, we consider the annotation procedure and its effect on the
model performance. We assume three main virtues of a good dataset collected for
a model training to be label quality, diversity, and completeness. We compare
the effects of those virtues on the model performance using open medical CT
datasets and conclude, that quality is more important than diversity early
during labeling; the diversity, in turn, is more important than completeness.
Based on this conclusion and additional experiments, we propose a labeling
procedure for the segmentation of tomographic images to minimize efforts spent
on labeling while maximizing the model performance.
Related papers
- Semi-supervised Medical Image Segmentation Method Based on Cross-pseudo
Labeling Leveraging Strong and Weak Data Augmentation Strategies [2.8246591681333024]
This paper proposes a semi-supervised model, DFCPS, which innovatively incorporates the Fixmatch concept.
Cross-pseudo-supervision is introduced, integrating consistency learning with self-training.
Our model consistently exhibits superior performance across all four subdivisions containing different proportions of unlabeled data.
arXiv Detail & Related papers (2024-02-17T13:07:44Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Has Your Pretrained Model Improved? A Multi-head Posterior Based
Approach [25.927323251675386]
We leverage the meta-features associated with each entity as a source of worldly knowledge and employ entity representations from the models.
We propose using the consistency between these representations and the meta-features as a metric for evaluating pre-trained models.
Our method's effectiveness is demonstrated across various domains, including models with relational datasets, large language models and image models.
arXiv Detail & Related papers (2024-01-02T17:08:26Z) - Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning [47.02160072880698]
We introduce a self-evolving mechanism that allows the model itself to actively sample subsets that are equally or even more effective.
The key to our data sampling technique lies in the enhancement of diversity in the chosen subsets.
Extensive experiments across three datasets and benchmarks demonstrate the effectiveness of DiverseEvol.
arXiv Detail & Related papers (2023-11-14T14:10:40Z) - A Simple and Efficient Baseline for Data Attribution on Images [107.12337511216228]
Current state-of-the-art approaches require a large ensemble of as many as 300,000 models to accurately attribute model predictions.
In this work, we focus on a minimalist baseline, utilizing the feature space of a backbone pretrained via self-supervised learning to perform data attribution.
Our method is model-agnostic and scales easily to large datasets.
arXiv Detail & Related papers (2023-11-03T17:29:46Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - Domain Generalization for Mammographic Image Analysis with Contrastive
Learning [62.25104935889111]
The training of an efficacious deep learning model requires large data with diverse styles and qualities.
A novel contrastive learning is developed to equip the deep learning models with better style generalization capability.
The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets.
arXiv Detail & Related papers (2023-04-20T11:40:21Z) - Exploring the Effects of Data Augmentation for Drivable Area
Segmentation [0.0]
We focus on investigating the benefits of data augmentation by analyzing pre-existing image datasets.
Our results show that the performance and robustness of existing state of the art (or SOTA) models can be increased dramatically.
arXiv Detail & Related papers (2022-08-06T03:39:37Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z) - Improving Semantic Segmentation via Self-Training [75.07114899941095]
We show that we can obtain state-of-the-art results using a semi-supervised approach, specifically a self-training paradigm.
We first train a teacher model on labeled data, and then generate pseudo labels on a large set of unlabeled data.
Our robust training framework can digest human-annotated and pseudo labels jointly and achieve top performances on Cityscapes, CamVid and KITTI datasets.
arXiv Detail & Related papers (2020-04-30T17:09:17Z) - Knowledge Distillation for Brain Tumor Segmentation [0.0]
We study the relationship between the performance of the model and the amount of data employed during the training process.
A single model trained with additional data achieves performance close to the ensemble of multiple models and outperforms individual methods.
arXiv Detail & Related papers (2020-02-10T12:44:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.