Surgical Action Triplet Detection by Mixed Supervised Learning of
Instrument-Tissue Interactions
- URL: http://arxiv.org/abs/2307.09548v1
- Date: Tue, 18 Jul 2023 18:47:48 GMT
- Title: Surgical Action Triplet Detection by Mixed Supervised Learning of
Instrument-Tissue Interactions
- Authors: Saurav Sharma, Chinedu Innocent Nwoye, Didier Mutter, Nicolas Padoy
- Abstract summary: Surgical action triplets describe instrument-tissue interactions as (instrument, verb, target) combinations.
This work focuses on surgical action triplet detection, which is challenging but more precise than the traditional triplet recognition task.
We propose MCIT-IG, a two-stage network, that stands for Multi-Class Instrument-aware Transformer-Interaction Graph.
- Score: 5.033722555649178
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Surgical action triplets describe instrument-tissue interactions as
(instrument, verb, target) combinations, thereby supporting a detailed analysis
of surgical scene activities and workflow. This work focuses on surgical action
triplet detection, which is challenging but more precise than the traditional
triplet recognition task as it consists of joint (1) localization of surgical
instruments and (2) recognition of the surgical action triplet associated with
every localized instrument. Triplet detection is highly complex due to the lack
of spatial triplet annotation. We analyze how the amount of instrument spatial
annotations affects triplet detection and observe that accurate instrument
localization does not guarantee better triplet detection due to the risk of
erroneous associations with the verbs and targets. To solve the two tasks, we
propose MCIT-IG, a two-stage network, that stands for Multi-Class
Instrument-aware Transformer-Interaction Graph. The MCIT stage of our network
models per class embedding of the targets as additional features to reduce the
risk of misassociating triplets. Furthermore, the IG stage constructs a
bipartite dynamic graph to model the interaction between the instruments and
targets, cast as the verbs. We utilize a mixed-supervised learning strategy
that combines weak target presence labels for MCIT and pseudo triplet labels
for IG to train our network. We observed that complementing minimal instrument
spatial annotations with target embeddings results in better triplet detection.
We evaluate our model on the CholecT50 dataset and show improved performance on
both instrument localization and triplet detection, topping the leaderboard of
the CholecTriplet challenge in MICCAI 2022.
Related papers
- Surgical Triplet Recognition via Diffusion Model [59.50938852117371]
Surgical triplet recognition is an essential building block to enable next-generation context-aware operating rooms.
We propose Difft, a new generative framework for surgical triplet recognition employing the diffusion model.
Experiments on the CholecT45 and CholecT50 datasets show the superiority of the proposed method in achieving a new state-of-the-art performance for surgical triplet recognition.
arXiv Detail & Related papers (2024-06-19T04:43:41Z) - Cross-Cluster Shifting for Efficient and Effective 3D Object Detection
in Autonomous Driving [69.20604395205248]
We present a new 3D point-based detector model, named Shift-SSD, for precise 3D object detection in autonomous driving.
We introduce an intriguing Cross-Cluster Shifting operation to unleash the representation capacity of the point-based detector.
We conduct extensive experiments on the KITTI, runtime, and nuScenes datasets, and the results demonstrate the state-of-the-art performance of Shift-SSD.
arXiv Detail & Related papers (2024-03-10T10:36:32Z) - Exploring Self- and Cross-Triplet Correlations for Human-Object
Interaction Detection [38.86053346974547]
We propose to explore Self- and Cross-Triplet Correlations for HOI detection.
Specifically, we regard each triplet proposal as a graph where Human, Object represent nodes and Action indicates edge.
Also, we try to explore cross-triplet dependencies by jointly considering instance-level, semantic-level, and layout-level relations.
arXiv Detail & Related papers (2024-01-11T05:38:24Z) - CholecTriplet2022: Show me a tool and tell me the triplet -- an
endoscopic vision challenge for surgical action triplet detection [41.66666272822756]
This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection.
It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool) as the key actors, and the modeling of each tool-activity in the form of instrument, verb, target> triplet.
arXiv Detail & Related papers (2023-02-13T11:53:14Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - TraSeTR: Track-to-Segment Transformer with Contrastive Query for
Instance-level Instrument Segmentation in Robotic Surgery [60.439434751619736]
We propose TraSeTR, a Track-to-Segment Transformer that exploits tracking cues to assist surgical instrument segmentation.
TraSeTR jointly reasons about the instrument type, location, and identity with instance-level predictions.
The effectiveness of our method is demonstrated with state-of-the-art instrument type segmentation results on three public datasets.
arXiv Detail & Related papers (2022-02-17T05:52:18Z) - Real-time landmark detection for precise endoscopic submucosal
dissection via shape-aware relation network [51.44506007844284]
We propose a shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection surgery.
We first devise an algorithm to automatically generate relation keypoint heatmaps, which intuitively represent the prior knowledge of spatial relations among landmarks.
We then develop two complementary regularization schemes to progressively incorporate the prior knowledge into the training process.
arXiv Detail & Related papers (2021-11-08T07:57:30Z) - Rendezvous: Attention Mechanisms for the Recognition of Surgical Action
Triplets in Endoscopic Videos [12.725586100227337]
Action triplet recognition stands out as the only one aiming to provide truly fine-grained and comprehensive information on surgical activities.
We introduce our new model, the Rendezvous (RDV), which recognizes triplets directly from surgical videos by leveraging attention at two different levels.
Our proposed RDV model significantly improves the triplet prediction mAP by over 9% compared to the state-of-the-art methods on this dataset.
arXiv Detail & Related papers (2021-09-07T17:52:52Z) - Recognition of Instrument-Tissue Interactions in Endoscopic Videos via
Action Triplets [9.517537672430006]
We tackle the recognition of fine-grained activities, modeled as action triplets instrument, verb, target> representing the tool activity.
We introduce a new laparoscopic dataset, CholecT40, consisting of 40 videos from the public dataset Cholec80 in which all frames have been annotated using 128 triplet classes.
arXiv Detail & Related papers (2020-07-10T14:17:10Z) - Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions.
Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures.
The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.