Related papers: Mutually improved endoscopic image synthesis and landmark detection in unpaired image-to-image translation

Mutually improved endoscopic image synthesis and landmark detection in unpaired image-to-image translation

URL: http://arxiv.org/abs/2107.06941v1
Date: Wed, 14 Jul 2021 19:09:50 GMT
Title: Mutually improved endoscopic image synthesis and landmark detection in unpaired image-to-image translation
Authors: Lalith Sharan, Gabriele Romano, Sven Koehler, Halvar Kelm, Matthias Karck, Raffaele De Simone and Sandy Engelhardt
Abstract summary: The CycleGAN framework allows for unsupervised image-to-image translation of unpaired data. In a scenario of surgical training on a physical surgical simulator, this method can be used to transform endoscopic images of phantoms into images which more closely resemble the intra-operative appearance of the same surgical target structure. We show that a task defined on sparse landmark labels improves consistency of synthesis by the generator network in both domains.
Score: 0.9322743017642274
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The CycleGAN framework allows for unsupervised image-to-image translation of unpaired data. In a scenario of surgical training on a physical surgical simulator, this method can be used to transform endoscopic images of phantoms into images which more closely resemble the intra-operative appearance of the same surgical target structure. This can be viewed as a novel augmented reality approach, which we coined Hyperrealism in previous work. In this use case, it is of paramount importance to display objects like needles, sutures or instruments consistent in both domains while altering the style to a more tissue-like appearance. Segmentation of these objects would allow for a direct transfer, however, contouring of these, partly tiny and thin foreground objects is cumbersome and perhaps inaccurate. Instead, we propose to use landmark detection on the points when sutures pass into the tissue. This objective is directly incorporated into a CycleGAN framework by treating the performance of pre-trained detector models as an additional optimization goal. We show that a task defined on these sparse landmark labels improves consistency of synthesis by the generator network in both domains. Comparing a baseline CycleGAN architecture to our proposed extension (DetCycleGAN), mean precision (PPV) improved by +61.32, mean sensitivity (TPR) by +37.91, and mean F1 score by +0.4743. Furthermore, it could be shown that by dataset fusion, generated intra-operative images can be leveraged as additional training data for the detection network itself. The data is released within the scope of the AdaptOR MICCAI Challenge 2021 at https://adaptor2021.github.io/, and code at https://github.com/Cardio-AI/detcyclegan_pytorch.

Related papers

Sim2Real within 5 Minutes: Efficient Domain Transfer with Stylized Gaussian Splatting for Endoscopic Images [28.802915155343964]
endoluminal intervention is an emerging technique for both benign and malignant luminal lesions. In practice, aligning pre-operative and intra-operative domains is complicated by significant texture differences. This paper proposes an efficient domain transfer method based on stylized Gaussian splatting.
arXiv Detail & Related papers (2024-03-16T08:57:00Z)
Cross-Dataset Adaptation for Instrument Classification in Cataract Surgery Videos [54.1843419649895]
State-of-the-art models, which perform this task well on a particular dataset, perform poorly when tested on another dataset. We propose a novel end-to-end Unsupervised Domain Adaptation (UDA) method called the Barlow Adaptor. In addition, we introduce a novel loss called the Barlow Feature Alignment Loss (BFAL) which aligns features across different domains.
arXiv Detail & Related papers (2023-07-31T18:14:18Z)
Multitask AET with Orthogonal Tangent Regularity for Dark Object Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment. In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation. We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z)
Two-Stream Graph Convolutional Network for Intra-oral Scanner Image Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes. Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z)
Controllable cardiac synthesis via disentangled anatomy arithmetic [15.351113774542839]
We propose a framework termed "disentangled anatomy arithmetic" A generative model learns to combine anatomical factors of different input images with the desired imaging modality. Our model is used to generate realistic images, pathology labels, and segmentation masks.
arXiv Detail & Related papers (2021-07-04T23:13:33Z)
Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose. Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification. We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z)
Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels. We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images. We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z)
G-SimCLR : Self-Supervised Contrastive Learning with Guided Projection via Pseudo Labelling [0.8164433158925593]
In computer vision, it is evident that deep neural networks perform better in a supervised setting with a large amount of labeled data. In this work, we propose that, with the normalized temperature-scaled cross-entropy (NT-Xent) loss function, it is beneficial to not have images of the same category in the same batch. We use the latent space representation of a denoising autoencoder trained on the unlabeled dataset and cluster them with k-means to obtain pseudo labels.
arXiv Detail & Related papers (2020-09-25T02:25:37Z)
Towards Unsupervised Learning for Instrument Segmentation in Robotic Surgery with Cycle-Consistent Adversarial Networks [54.00217496410142]
We propose an unpaired image-to-image translation where the goal is to learn the mapping between an input endoscopic image and a corresponding annotation. Our approach allows to train image segmentation models without the need to acquire expensive annotations. We test our proposed method on Endovis 2017 challenge dataset and show that it is competitive with supervised segmentation methods.
arXiv Detail & Related papers (2020-07-09T01:39:39Z)
Segmentation of Surgical Instruments for Minimally-Invasive Robot-Assisted Procedures Using Generative Deep Neural Networks [17.571763112459166]
This work proves that semantic segmentation on minimally invasive surgical instruments can be improved by using training data. To achieve this, a CycleGAN model is used, which transforms a source dataset to approximate the domain distribution of a target dataset. This newly generated data with perfect labels is utilized to train a semantic segmentation neural network, U-Net.
arXiv Detail & Related papers (2020-06-05T14:39:41Z)
LC-GAN: Image-to-image Translation Based on Generative Adversarial Network for Endoscopic Images [22.253074722129053]
We propose an image-to-image translation model live-cadaver GAN (LC-GAN) based on generative adversarial networks (GANs) For live image segmentation, we first translate the live images to fake-cadaveric images with LC-GAN and then perform segmentation on the fake-cadaveric images with models trained on the real cadaveric dataset. Our model achieves better image-to-image translation and leads to improved segmentation performance in the proposed cross-domain segmentation task.
arXiv Detail & Related papers (2020-03-10T19:59:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.