Sliding Window FastEdit: A Framework for Lesion Annotation in Whole-body
PET Images
- URL: http://arxiv.org/abs/2311.14482v1
- Date: Fri, 24 Nov 2023 13:45:58 GMT
- Title: Sliding Window FastEdit: A Framework for Lesion Annotation in Whole-body
PET Images
- Authors: Matthias Hadlich, Zdravko Marinov, Moon Kim, Enrico Nasca, Jens
Kleesiek, Rainer Stiefelhagen
- Abstract summary: Deep learning has revolutionized the accurate segmentation of diseases in medical imaging.
This requirement presents a challenge for whole-body Positron Emission Tomography (PET) imaging, where lesions are scattered throughout the body.
We introduce SW-FastEdit - an interactive segmentation framework that accelerates the labeling by utilizing only a few user clicks instead of voxelwise annotations.
Our model outperforms existing non-sliding window interactive models on the AutoPET dataset and generalizes to the previously unseen HECKTOR dataset.
- Score: 24.7560446107659
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning has revolutionized the accurate segmentation of diseases in
medical imaging. However, achieving such results requires training with
numerous manual voxel annotations. This requirement presents a challenge for
whole-body Positron Emission Tomography (PET) imaging, where lesions are
scattered throughout the body. To tackle this problem, we introduce SW-FastEdit
- an interactive segmentation framework that accelerates the labeling by
utilizing only a few user clicks instead of voxelwise annotations. While prior
interactive models crop or resize PET volumes due to memory constraints, we use
the complete volume with our sliding window-based interactive scheme. Our model
outperforms existing non-sliding window interactive models on the AutoPET
dataset and generalizes to the previously unseen HECKTOR dataset. A user study
revealed that annotators achieve high-quality predictions with only 10 click
iterations and a low perceived NASA-TLX workload. Our framework is implemented
using MONAI Label and is available:
https://github.com/matt3o/AutoPET2-Submission/
Related papers
- A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation [22.440065488051047]
Key challenge for the widespread application of learning-based models for robotic perception is to significantly reduce the required amount of annotated training data.
We exploit the groundwork paved by visual foundation models to train two lightweight network heads for semantic segmentation and object boundary detection.
We demonstrate that PASTEL significantly outperforms previous methods for label-efficient segmentation even when using fewer annotations.
arXiv Detail & Related papers (2024-05-29T12:23:29Z) - Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition [58.79807861739438]
Existing pedestrian recognition (PAR) algorithms are mainly developed based on a static image.
We propose to understand human attributes using video frames that can fully use temporal information.
arXiv Detail & Related papers (2024-04-27T14:43:32Z) - LiteNeXt: A Novel Lightweight ConvMixer-based Model with Self-embedding Representation Parallel for Medical Image Segmentation [2.0901574458380403]
We propose a new lightweight but efficient model, namely LiteNeXt, for medical image segmentation.
LiteNeXt is trained from scratch with small amount of parameters (0.71M) and Giga Floating Point Operations Per Second (0.42).
arXiv Detail & Related papers (2024-04-04T01:59:19Z) - Multimodal Interactive Lung Lesion Segmentation: A Framework for
Annotating PET/CT Images based on Physiological and Anatomical Cues [16.159693927845975]
Deep learning has enabled the accurate segmentation of various diseases in medical imaging.
These performances, however, typically demand large amounts of manual voxel annotations.
We propose a multimodal interactive segmentation framework that mitigates these issues by combining anatomical and physiological cues from PET/CT data.
arXiv Detail & Related papers (2023-01-24T10:50:45Z) - ClipCrop: Conditioned Cropping Driven by Vision-Language Model [90.95403416150724]
We take advantage of vision-language models as a foundation for creating robust and user-intentional cropping algorithms.
We develop a method to perform cropping with a text or image query that reflects the user's intention as guidance.
Our pipeline design allows the model to learn text-conditioned aesthetic cropping with a small dataset.
arXiv Detail & Related papers (2022-11-21T14:27:07Z) - Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration.
ART presents both dense and sparse attention modules in the network.
We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z) - Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer
Using Patches [20.020356453279685]
Surgical captioning plays an important role in surgical instruction prediction and report generation.
Most captioning models still rely on the heavy computational object detector or feature extractor to extract regional features.
We design an end-to-end detector and feature extractor-free captioning model by utilizing the patch-based shifted window technique.
arXiv Detail & Related papers (2022-06-30T21:57:33Z) - Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene
Segmentation [58.74791043631219]
We propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance.
We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset.
Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T05:52:23Z) - VIOLET : End-to-End Video-Language Transformers with Masked Visual-token
Modeling [88.30109041658618]
A great challenge in video-language (VidL) modeling lies in the disconnection between fixed video representations extracted from image/video understanding models and downstream VidL data.
We present VIOLET, a fully end-to-end VIdeO-LanguagE Transformer, which adopts a video transformer to explicitly model the temporal dynamics of video inputs.
arXiv Detail & Related papers (2021-11-24T18:31:20Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.