Rethinking Surgical Smoke: A Smoke-Type-Aware Laparoscopic Video Desmoking Method and Dataset
- URL: http://arxiv.org/abs/2512.02780v1
- Date: Tue, 02 Dec 2025 13:55:27 GMT
- Title: Rethinking Surgical Smoke: A Smoke-Type-Aware Laparoscopic Video Desmoking Method and Dataset
- Authors: Qifan Liang, Junlin Li, Zhen Han, Xihao Wang, Zhongyuan Wang, Bin Mei,
- Abstract summary: Smoke-type-Aware Laparoscopic Video Desmoking Network (STANet)<n>We introduce two smoke types: Diffusion Smoke and Ambient Smoke.<n>We also construct the first large-scale synthetic smoking dataset with smoke type annotations.
- Score: 21.493577935588732
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Electrocautery or lasers will inevitably generate surgical smoke, which hinders the visual guidance of laparoscopic videos for surgical procedures. The surgical smoke can be classified into different types based on its motion patterns, leading to distinctive spatio-temporal characteristics across smoky laparoscopic videos. However, existing desmoking methods fail to account for such smoke-type-specific distinctions. Therefore, we propose the first Smoke-Type-Aware Laparoscopic Video Desmoking Network (STANet) by introducing two smoke types: Diffusion Smoke and Ambient Smoke. Specifically, a smoke mask segmentation sub-network is designed to jointly conduct smoke mask and smoke type predictions based on the attention-weighted mask aggregation, while a smokeless video reconstruction sub-network is proposed to perform specially desmoking on smoky features guided by two types of smoke mask. To address the entanglement challenges of two smoke types, we further embed a coarse-to-fine disentanglement module into the mask segmentation sub-network, which yields more accurate disentangled masks through the smoke-type-aware cross attention between non-entangled and entangled regions. In addition, we also construct the first large-scale synthetic video desmoking dataset with smoke type annotations. Extensive experiments demonstrate that our method not only outperforms state-of-the-art approaches in quality evaluations, but also exhibits superior generalization across multiple downstream surgical tasks.
Related papers
- ExposeAnyone: Personalized Audio-to-Expression Diffusion Models Are Robust Zero-Shot Face Forgery Detectors [58.45131932883374]
We propose a fully self-supervised approach to detect deepfakes in videos.<n>Our model computes the identity distances between suspected videos and personalized subjects via diffusion reconstruction errors.<n>Our method is highly robust to corruptions such as blur and compression, highlighting the applicability in real-world face forgery detection.
arXiv Detail & Related papers (2026-01-05T18:59:54Z) - SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection [19.134309978060134]
SmokeBench is a benchmark to evaluate the ability of multimodal large language models (MLLMs) to recognize and localize wildfire smoke in images.<n>We evaluate several MLLMs, including Idefics2, Qwen2.5-VL, InternVL3, Unified-IO 2, Grounding DINO, GPT-4o, and Gemini-2.5 Pro.<n>Smoke volume is strongly correlated with model performance, whereas contrast plays a comparatively minor role.
arXiv Detail & Related papers (2025-12-12T01:47:28Z) - MFGDiffusion: Mask-Guided Smoke Synthesis for Enhanced Forest Fire Detection [6.307649189539342]
Smoke is the first visible indicator of a wildfire.<n>Current inpainting models exhibit limitations in generating high-quality smoke representations.<n>We propose a comprehensive framework for generating forest fire smoke images.
arXiv Detail & Related papers (2025-07-15T12:25:35Z) - LSD3K: A Benchmark for Smoke Removal from Laparoscopic Surgery Images [0.7138611948315257]
Smoke generated by surgical instruments during laparoscopic surgery can obscure the visual field, impairing surgeons' ability to perform operations accurately and safely.
Despite laparoscopic image desmoking has attracted the attention of researchers in recent years, the lack of publicly available high-quality benchmark datasets is the main bottleneck to hamper the development progress of this task.
We construct a new high-quality dataset for Laparoscopic Surgery image Desmoking, named LSD3K, consisting of 3,000 paired synthetic non-homogeneous smoke images.
arXiv Detail & Related papers (2024-07-18T03:42:16Z) - Attention-Aware Laparoscopic Image Desmoking Network with Lightness Embedding and Hybrid Guided Embedding [9.909043664967063]
A two-stage network is proposed to estimate the smoke distribution and reconstruct a clear, smoke-free surgical scene.
The proposed method boasts a Peak Signal to Noise Ratio that is $2.79%$ higher than the state-of-the-art methods.
arXiv Detail & Related papers (2024-04-11T08:36:36Z) - Self-Supervised Video Desmoking for Laparoscopic Surgery [48.83900673665993]
We introduce self-supervised surgery video desmoking (SelfSVD)
We observe that the frame captured before the activation of high-energy devices is generally clear (named pre-smoke frame, PS frame)
We further feed the valuable information from PS frame into models, where a masking strategy and a regularization term are presented to avoid trivial solutions.
arXiv Detail & Related papers (2024-03-17T12:38:58Z) - DFormer: Diffusion-guided Transformer for Universal Image Segmentation [86.73405604947459]
The proposed DFormer views universal image segmentation task as a denoising process using a diffusion model.
At inference, our DFormer directly predicts the masks and corresponding categories from a set of randomly-generated masks.
Our DFormer outperforms the recent diffusion-based panoptic segmentation method Pix2Seq-D with a gain of 3.6% on MS COCO val 2017 set.
arXiv Detail & Related papers (2023-06-06T06:33:32Z) - Benchmarking Joint Face Spoofing and Forgery Detection with Visual and
Physiological Cues [81.15465149555864]
We establish the first joint face spoofing and detection benchmark using both visual appearance and physiological r cues.
To enhance the r periodicity discrimination, we design a two-branch physiological network using both facial powerful rtemporal signal map and its continuous wavelet transformed counterpart as inputs.
arXiv Detail & Related papers (2022-08-10T15:41:48Z) - Shape-Aware Masking for Inpainting in Medical Imaging [49.61617087640379]
Inpainting has been proposed as a successful deep learning technique for unsupervised medical image model discovery.
We introduce a method for generating shape-aware masks for inpainting, which aims at learning the statistical shape prior.
We propose an unsupervised guided masking approach based on an off-the-shelf inpainting model and a superpixel over-segmentation algorithm.
arXiv Detail & Related papers (2022-07-12T18:35:17Z) - STCNet: Spatio-Temporal Cross Network for Industrial Smoke Detection [52.648906951532155]
We propose a novel Spatio-Temporal Cross Network (STCNet) to recognize industrial smoke emissions.
The proposed STCNet involves a spatial to extract texture features and a temporal pathway to capture smoke motion information.
We show that our STCNet achieves clear improvements on the challenging RISE industrial smoke detection dataset against the best competitors by 6.2%.
arXiv Detail & Related papers (2020-11-10T02:28:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.