Optimizing the Procedure of CT Segmentation Labeling
- URL: http://arxiv.org/abs/2303.14089v1
- Date: Fri, 24 Mar 2023 15:52:42 GMT
- Title: Optimizing the Procedure of CT Segmentation Labeling
- Authors: Yaroslav Zharov, Tilo Baumbach, Vincent Heuveline
- Abstract summary: In Computed Tomography, machine learning is often used for automated data processing.
We consider the annotation procedure and its effect on the model performance.
We assume three main virtues of a good dataset collected for a model training to be label quality, diversity, and completeness.
- Score: 1.2891210250935146
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In Computed Tomography, machine learning is often used for automated data
processing. However, increasing model complexity is accompanied by increasingly
large volume datasets, which in turn increases the cost of model training.
Unlike most work that mitigates this by advancing model architectures and
training algorithms, we consider the annotation procedure and its effect on the
model performance. We assume three main virtues of a good dataset collected for
a model training to be label quality, diversity, and completeness. We compare
the effects of those virtues on the model performance using open medical CT
datasets and conclude, that quality is more important than diversity early
during labeling; the diversity, in turn, is more important than completeness.
Based on this conclusion and additional experiments, we propose a labeling
procedure for the segmentation of tomographic images to minimize efforts spent
on labeling while maximizing the model performance.
Related papers
- Keypoints-Integrated Instruction-Following Data Generation for Enhanced Human Pose Understanding in Multimodal Models [1.9890559505377343]
We introduce a new method for generating such data by integrating human keypoints with traditional visual features like captions and bounding boxes.
Our approach produces datasets designed for fine-tuning models to excel in human-centric activities.
Experimental results show an overall improvement of 21.18% compared to the original LLaVA-7B model.
arXiv Detail & Related papers (2024-09-14T05:07:57Z) - PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation [51.509573838103854]
We propose a semi-supervised learning framework, termed Progressive Mean Teachers (PMT), for medical image segmentation.
Our PMT generates high-fidelity pseudo labels by learning robust and diverse features in the training process.
Experimental results on two datasets with different modalities, i.e., CT and MRI, demonstrate that our method outperforms the state-of-the-art medical image segmentation approaches.
arXiv Detail & Related papers (2024-09-08T15:02:25Z) - Semi-supervised Medical Image Segmentation Method Based on Cross-pseudo
Labeling Leveraging Strong and Weak Data Augmentation Strategies [2.8246591681333024]
This paper proposes a semi-supervised model, DFCPS, which innovatively incorporates the Fixmatch concept.
Cross-pseudo-supervision is introduced, integrating consistency learning with self-training.
Our model consistently exhibits superior performance across all four subdivisions containing different proportions of unlabeled data.
arXiv Detail & Related papers (2024-02-17T13:07:44Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning [47.02160072880698]
We introduce a self-evolving mechanism that allows the model itself to actively sample subsets that are equally or even more effective.
The key to our data sampling technique lies in the enhancement of diversity in the chosen subsets.
Extensive experiments across three datasets and benchmarks demonstrate the effectiveness of DiverseEvol.
arXiv Detail & Related papers (2023-11-14T14:10:40Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - Domain Generalization for Mammographic Image Analysis with Contrastive
Learning [62.25104935889111]
The training of an efficacious deep learning model requires large data with diverse styles and qualities.
A novel contrastive learning is developed to equip the deep learning models with better style generalization capability.
The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets.
arXiv Detail & Related papers (2023-04-20T11:40:21Z) - Exploring the Effects of Data Augmentation for Drivable Area
Segmentation [0.0]
We focus on investigating the benefits of data augmentation by analyzing pre-existing image datasets.
Our results show that the performance and robustness of existing state of the art (or SOTA) models can be increased dramatically.
arXiv Detail & Related papers (2022-08-06T03:39:37Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z) - Improving Semantic Segmentation via Self-Training [75.07114899941095]
We show that we can obtain state-of-the-art results using a semi-supervised approach, specifically a self-training paradigm.
We first train a teacher model on labeled data, and then generate pseudo labels on a large set of unlabeled data.
Our robust training framework can digest human-annotated and pseudo labels jointly and achieve top performances on Cityscapes, CamVid and KITTI datasets.
arXiv Detail & Related papers (2020-04-30T17:09:17Z) - Knowledge Distillation for Brain Tumor Segmentation [0.0]
We study the relationship between the performance of the model and the amount of data employed during the training process.
A single model trained with additional data achieves performance close to the ensemble of multiple models and outperforms individual methods.
arXiv Detail & Related papers (2020-02-10T12:44:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.