Related papers: Inpainting is All You Need: A Diffusion-based Augmentation Method for Semi-supervised Medical Image Segmentation

Inpainting is All You Need: A Diffusion-based Augmentation Method for Semi-supervised Medical Image Segmentation

URL: http://arxiv.org/abs/2506.23038v1
Date: Sat, 28 Jun 2025 23:44:18 GMT
Title: Inpainting is All You Need: A Diffusion-based Augmentation Method for Semi-supervised Medical Image Segmentation
Authors: Xinrong Hu, Yiyu Shi,
Abstract summary: AugPaint is a framework that generates image-label pairs from limited labeled data.<n>We conducted evaluations of our data augmentation method on four public medical image segmentation datasets.<n>Results across all datasets demonstrate that AugPaint outperforms state-of-the-art label-efficient methodologies.
Score: 8.772764547425291
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Collecting pixel-level labels for medical datasets can be a laborious and expensive process, and enhancing segmentation performance with a scarcity of labeled data is a crucial challenge. This work introduces AugPaint, a data augmentation framework that utilizes inpainting to generate image-label pairs from limited labeled data. AugPaint leverages latent diffusion models, known for their ability to generate high-quality in-domain images with low overhead, and adapts the sampling process for the inpainting task without need for retraining. Specifically, given a pair of image and label mask, we crop the area labeled with the foreground and condition on it during reversed denoising process for every noise level. Masked background area would gradually be filled in, and all generated images are paired with the label mask. This approach ensures the accuracy of match between synthetic images and label masks, setting it apart from existing dataset generation methods. The generated images serve as valuable supervision for training downstream segmentation models, effectively addressing the challenge of limited annotations. We conducted extensive evaluations of our data augmentation method on four public medical image segmentation datasets, including CT, MRI, and skin imaging. Results across all datasets demonstrate that AugPaint outperforms state-of-the-art label-efficient methodologies, significantly improving segmentation performance.

Related papers

Semi-Supervised Biomedical Image Segmentation via Diffusion Models and Teacher-Student Co-Training [7.915123555266876]
Supervised deep learning for semantic segmentation has achieved excellent results in accurately identifying anatomical and pathological structures in medical images.<n>It often requires large annotated training datasets, which limits its scalability in clinical settings.<n>We introduce a novel semi-supervised teacher-student framework for biomedical image segmentation, inspired by the recent success of generative models.
arXiv Detail & Related papers (2025-04-02T09:41:43Z)
PathoPainter: Augmenting Histopathology Segmentation via Tumor-aware Inpainting [7.518548705907955]
We propose PathoPainter, which reformulates image-mask pair generation as a tumor inpainting task.<n>Our approach preserves the background while inpainting the tumor region, ensuring precise alignment between the generated image and its corresponding mask.<n>Our comprehensive evaluation spans multiple datasets featuring diverse tumor types and various training data scales.
arXiv Detail & Related papers (2025-03-06T17:21:12Z)
Discriminative Hamiltonian Variational Autoencoder for Accurate Tumor Segmentation in Data-Scarce Regimes [2.8498944632323755]
We propose an end-to-end hybrid architecture for medical image segmentation. We use Hamiltonian Variational Autoencoders (HVAE) and a discriminative regularization to improve the quality of generated images. Our architecture operates on a slice-by-slice basis to segment 3D volumes, capitilizing on the richly augmented dataset.
arXiv Detail & Related papers (2024-06-17T15:42:08Z)
Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation [6.82236459614491]
We propose a novel method for generating pixel-level semantic segmentation labels using the text-to-image generative model Stable Diffusion. By utilizing the text prompts, cross-attention, and self-attention of SD, we introduce three new techniques: class-prompt appending, class-prompt cross-attention, and self-attention exponentiation. These techniques enable us to generate segmentation maps corresponding to synthetic images.
arXiv Detail & Related papers (2023-09-25T17:19:26Z)
DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models [68.21154597227165]
We show that it is possible to automatically obtain accurate semantic masks of synthetic images generated by the Off-the-shelf Stable Diffusion model. Our approach, called DiffuMask, exploits the potential of the cross-attention map between text and image.
arXiv Detail & Related papers (2023-03-21T08:43:15Z)
Self-Supervised Correction Learning for Semi-Supervised Biomedical Image Segmentation [84.58210297703714]
We propose a self-supervised correction learning paradigm for semi-supervised biomedical image segmentation. We design a dual-task network, including a shared encoder and two independent decoders for segmentation and lesion region inpainting. Experiments on three medical image segmentation datasets for different tasks demonstrate the outstanding performance of our method.
arXiv Detail & Related papers (2023-01-12T08:19:46Z)
High-Quality Entity Segmentation [110.55724145851725]
CropFormer is designed to tackle the intractability of instance-level segmentation on high-resolution images. It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image. With CropFormer, we achieve a significant AP gain of $1.9$ on the challenging entity segmentation task.
arXiv Detail & Related papers (2022-11-10T18:58:22Z)
Dual-Perspective Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels [70.36722026729859]
We propose a dual-perspective semantic-aware representation blending (DSRB) that blends multi-granularity category-specific semantic representation across different images. The proposed DS consistently outperforms current state-of-the-art algorithms on all proportion label settings.
arXiv Detail & Related papers (2022-05-26T00:33:44Z)
Robust Medical Image Classification from Noisy Labeled Data with Global and Local Representation Guided Co-training [73.60883490436956]
We propose a novel collaborative training paradigm with global and local representation learning for robust medical image classification. We employ the self-ensemble model with a noisy label filter to efficiently select the clean and noisy samples. We also design a novel global and local representation learning scheme to implicitly regularize the networks to utilize noisy samples.
arXiv Detail & Related papers (2022-05-10T07:50:08Z)
Mixed Supervision Learning for Whole Slide Image Classification [88.31842052998319]
We propose a mixed supervision learning framework for super high-resolution images. During the patch training stage, this framework can make use of coarse image-level labels to refine self-supervised learning. A comprehensive strategy is proposed to suppress pixel-level false positives and false negatives.
arXiv Detail & Related papers (2021-07-02T09:46:06Z)
Distilling effective supervision for robust medical image segmentation with noisy labels [21.68138582276142]
We propose a novel framework to address segmenting with noisy labels by distilling effective supervision information from both pixel and image levels. In particular, we explicitly estimate the uncertainty of every pixel as pixel-wise noise estimation. We present an image-level robust learning method to accommodate more information as the complements to pixel-level learning.
arXiv Detail & Related papers (2021-06-21T13:33:38Z)
Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels. We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images. We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.