Related papers: Surgical Triplet Recognition via Diffusion Model

Surgical Triplet Recognition via Diffusion Model

URL: http://arxiv.org/abs/2406.13210v2
Date: Mon, 24 Jun 2024 08:22:40 GMT
Title: Surgical Triplet Recognition via Diffusion Model
Authors: Daochang Liu, Axel Hu, Mubarak Shah, Chang Xu,
Abstract summary: Surgical triplet recognition is an essential building block to enable next-generation context-aware operating rooms. We propose Difft, a new generative framework for surgical triplet recognition employing the diffusion model. Experiments on the CholecT45 and CholecT50 datasets show the superiority of the proposed method in achieving a new state-of-the-art performance for surgical triplet recognition.
Score: 59.50938852117371
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Surgical triplet recognition is an essential building block to enable next-generation context-aware operating rooms. The goal is to identify the combinations of instruments, verbs, and targets presented in surgical video frames. In this paper, we propose DiffTriplet, a new generative framework for surgical triplet recognition employing the diffusion model, which predicts surgical triplets via iterative denoising. To handle the challenge of triplet association, two unique designs are proposed in our diffusion framework, i.e., association learning and association guidance. During training, we optimize the model in the joint space of triplets and individual components to capture the dependencies among them. At inference, we integrate association constraints into each update of the iterative denoising process, which refines the triplet prediction using the information of individual components. Experiments on the CholecT45 and CholecT50 datasets show the superiority of the proposed method in achieving a new state-of-the-art performance for surgical triplet recognition. Our codes will be released.

Related papers

Landmark-Free Preoperative-to-Intraoperative Registration in Laparoscopic Liver Resection [50.388465935739376]
Liver registration by overlaying preoperative 3D models onto intraoperative 2D frames can assist surgeons in perceiving the spatial anatomy of the liver clearly for a higher surgical success rate. Existing registration methods rely heavily on anatomical landmark-based, which encounter two major limitations. We propose a landmark-free preoperative-to-intraoperative registration framework utilizing effective self-supervised learning.
arXiv Detail & Related papers (2025-04-21T14:55:57Z)
fine-CLIP: Enhancing Zero-Shot Fine-Grained Surgical Action Recognition with Vision-Language Models [3.8352069691069084]
We propose fine-CLIP, which learns object-centric features and lever- ages the hierarchy in triplet formulation. fine-CLIP shows significant improvements in F1 and mAP, enhancing zero-shot recognition of novel surgical triplets.
arXiv Detail & Related papers (2025-03-25T13:57:02Z)
Intraoperative Registration by Cross-Modal Inverse Neural Rendering [61.687068931599846]
We present a novel approach for 3D/2D intraoperative registration during neurosurgery via cross-modal inverse neural rendering. Our approach separates implicit neural representation into two components, handling anatomical structure preoperatively and appearance intraoperatively. We tested our method on retrospective patients' data from clinical cases, showing that our method outperforms state-of-the-art while meeting current clinical standards for registration.
arXiv Detail & Related papers (2024-09-18T13:40:59Z)
LACOSTE: Exploiting stereo and temporal contexts for surgical instrument segmentation [14.152207010509763]
We propose a novel LACOSTE model that exploits Location-Agnostic COntexts in Stereo and TEmporal images for improved surgical instrument segmentation. We extensively validate our approach on three public surgical video datasets.
arXiv Detail & Related papers (2024-09-14T08:17:56Z)
Language-free Compositional Action Generation via Decoupling Refinement [67.50452446686725]
We introduce a novel framework to generate compositional actions without reliance on language auxiliaries. Our approach consists of three main components: Action Coupling, Conditional Action Generation, and Decoupling Refinement.
arXiv Detail & Related papers (2023-07-07T12:00:38Z)
Diffusion Action Segmentation [63.061058214427085]
We propose a novel framework via denoising diffusion models, which shares the same inherent spirit of such iterative refinement. In this framework, action predictions are iteratively generated from random noise with input video features as conditions.
arXiv Detail & Related papers (2023-03-31T10:53:24Z)
CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection [41.66666272822756]
This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool) as the key actors, and the modeling of each tool-activity in the form of instrument, verb, target> triplet.
arXiv Detail & Related papers (2023-02-13T11:53:14Z)
Rendezvous in Time: An Attention-based Temporal Fusion approach for Surgical Triplet Recognition [5.033722555649178]
One of the recent advances in surgical AI is the recognition of surgical activities as triplets of (instrument, verb, target) Exploiting the temporal cues from earlier frames would improve the recognition of surgical action triplets from videos. We propose Rendezvous in Time (RiT) - a deep learning model that extends the state-of-the-art model, Rendezvous, with temporal modeling.
arXiv Detail & Related papers (2022-11-30T13:18:07Z)
CholecTriplet2021: A benchmark challenge for surgical action triplet recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z)
Rendezvous: Attention Mechanisms for the Recognition of Surgical Action Triplets in Endoscopic Videos [12.725586100227337]
Action triplet recognition stands out as the only one aiming to provide truly fine-grained and comprehensive information on surgical activities. We introduce our new model, the Rendezvous (RDV), which recognizes triplets directly from surgical videos by leveraging attention at two different levels. Our proposed RDV model significantly improves the triplet prediction mAP by over 9% compared to the state-of-the-art methods on this dataset.
arXiv Detail & Related papers (2021-09-07T17:52:52Z)
Contrastive Triple Extraction with Generative Transformer [72.21467482853232]
We introduce a novel model, contrastive triple extraction with a generative transformer. Specifically, we introduce a single shared transformer module for encoder-decoder-based generation. To generate faithful results, we propose a novel triplet contrastive training object.
arXiv Detail & Related papers (2020-09-14T05:29:24Z)
Recognition of Instrument-Tissue Interactions in Endoscopic Videos via Action Triplets [9.517537672430006]
We tackle the recognition of fine-grained activities, modeled as action triplets instrument, verb, target> representing the tool activity. We introduce a new laparoscopic dataset, CholecT40, consisting of 40 videos from the public dataset Cholec80 in which all frames have been annotated using 128 triplet classes.
arXiv Detail & Related papers (2020-07-10T14:17:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.