Self-Validated Learning for Particle Separation: A Correctness-Based Self-Training Framework Without Human Labels
- URL: http://arxiv.org/abs/2508.16224v1
- Date: Fri, 22 Aug 2025 08:54:11 GMT
- Title: Self-Validated Learning for Particle Separation: A Correctness-Based Self-Training Framework Without Human Labels
- Authors: Philipp D. Lösel, Aleese Barron, Yulai Zhang, Matthias Fabian, Benjamin Young, Nicolas Francois, Andrew M. Kingston,
- Abstract summary: Non-destructive 3D imaging of large multi-particulate samples is essential for quantifying particle-level properties.<n>We propose self-validated learning, a novel self-training framework for particle instance segmentation.<n>Our approach accurately segments over 97% of the total particle volume and identifies more than 54,000 individual particles in tomographic scans of quartz fragments.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Non-destructive 3D imaging of large multi-particulate samples is essential for quantifying particle-level properties, such as size, shape, and spatial distribution, across applications in mining, materials science, and geology. However, accurate instance segmentation of particles in tomographic data remains challenging due to high morphological variability and frequent particle contact, which limit the effectiveness of classical methods like watershed algorithms. While supervised deep learning approaches offer improved performance, they rely on extensive annotated datasets that are labor-intensive, error-prone, and difficult to scale. In this work, we propose self-validated learning, a novel self-training framework for particle instance segmentation that eliminates the need for manual annotations. Our method leverages implicit boundary detection and iteratively refines the training set by identifying particles that can be consistently matched across reshuffled scans of the same sample. This self-validation mechanism mitigates the impact of noisy pseudo-labels, enabling robust learning from unlabeled data. After just three iterations, our approach accurately segments over 97% of the total particle volume and identifies more than 54,000 individual particles in tomographic scans of quartz fragments. Importantly, the framework also enables fully autonomous model evaluation without the need for ground truth annotations, as confirmed through comparisons with state-of-the-art instance segmentation techniques. The method is integrated into the Biomedisa image analysis platform (https://github.com/biomedisa/biomedisa/).
Related papers
- ε-Seg: Sparsely Supervised Semantic Segmentation of Microscopy Data [6.617593699054488]
epsilon-Seg is a method based on hierarchical variational autoencoders (HVAEs)<n>We introduce epsilon-Seg, employing center-region masking, label contrastive learning (CL), a Gaussian mixture model (GMM) prior, and clustering-free label prediction.<n>Our results show that epsilon-Seg is capable of achieving competitive sparsely-supervised segmentation results on complex biological image data.
arXiv Detail & Related papers (2025-10-21T13:41:07Z) - Improved Sub-Visible Particle Classification in Flow Imaging Microscopy via Generative AI-Based Image Synthesis [1.172405562070645]
Sub-visible particle analysis using flow imaging microscopy combined with deep learning has proven effective in identifying particle types.<n>However, the scarcity of available data and severe imbalance between particle types within datasets remain substantial hurdles.<n>We develop a state-of-the-art diffusion model to address data imbalance by generating high-fidelity images that can augment training datasets.
arXiv Detail & Related papers (2025-08-08T05:15:02Z) - Self-training with dual uncertainty for semi-supervised medical image
segmentation [9.538419502275975]
Traditional self-training methods can partially solve the problem of insufficient labeled data by generating pseudo labels for iterative training.
We add sample-level and pixel-level uncertainty to stabilize the training process based on the self-training framework.
Our proposed method achieves better segmentation performance on both datasets under the same settings.
arXiv Detail & Related papers (2023-04-10T07:57:24Z) - Label Cleaning Multiple Instance Learning: Refining Coarse Annotations
on Single Whole-Slide Images [83.7047542725469]
Annotating cancerous regions in whole-slide images (WSIs) of pathology samples plays a critical role in clinical diagnosis, biomedical research, and machine learning algorithms development.
We present a method, named Label Cleaning Multiple Instance Learning (LC-MIL), to refine coarse annotations on a single WSI without the need of external training data.
Our experiments on a heterogeneous WSI set with breast cancer lymph node metastasis, liver cancer, and colorectal cancer samples show that LC-MIL significantly refines the coarse annotations, outperforming the state-of-the-art alternatives, even while learning from a single slide.
arXiv Detail & Related papers (2021-09-22T15:06:06Z) - Learning from Partially Overlapping Labels: Image Segmentation under
Annotation Shift [68.6874404805223]
We propose several strategies for learning from partially overlapping labels in the context of abdominal organ segmentation.
We find that combining a semi-supervised approach with an adaptive cross entropy loss can successfully exploit heterogeneously annotated data.
arXiv Detail & Related papers (2021-07-13T09:22:24Z) - Synthetic Image Rendering Solves Annotation Problem in Deep Learning
Nanoparticle Segmentation [5.927116192179681]
We show that using a rendering software allows to generate realistic, synthetic training data to train a state-of-the art deep neural network.
We derive a segmentation accuracy that is comparable to man-made annotations for toxicologically relevant metal-oxide nanoparticles ensembles.
arXiv Detail & Related papers (2020-11-20T17:05:36Z) - Semi-Automatic Data Annotation guided by Feature Space Projection [117.9296191012968]
We present a semi-automatic data annotation approach based on suitable feature space projection and semi-supervised label estimation.
We validate our method on the popular MNIST dataset and on images of human intestinal parasites with and without fecal impurities.
Our results demonstrate the added-value of visual analytics tools that combine complementary abilities of humans and machines for more effective machine learning.
arXiv Detail & Related papers (2020-07-27T17:03:50Z) - Deep Semi-supervised Knowledge Distillation for Overlapping Cervical
Cell Instance Segmentation [54.49894381464853]
We propose to leverage both labeled and unlabeled data for instance segmentation with improved accuracy by knowledge distillation.
We propose a novel Mask-guided Mean Teacher framework with Perturbation-sensitive Sample Mining.
Experiments show that the proposed method improves the performance significantly compared with the supervised method learned from labeled data only.
arXiv Detail & Related papers (2020-07-21T13:27:09Z) - Weakly Supervised Deep Nuclei Segmentation Using Partial Points
Annotation in Histopathology Images [51.893494939675314]
We propose a novel weakly supervised segmentation framework based on partial points annotation.
We show that our method can achieve competitive performance compared to the fully supervised counterpart and the state-of-the-art methods.
arXiv Detail & Related papers (2020-07-10T15:41:29Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.