Recognition of Instrument-Tissue Interactions in Endoscopic Videos via
Action Triplets
- URL: http://arxiv.org/abs/2007.05405v1
- Date: Fri, 10 Jul 2020 14:17:10 GMT
- Title: Recognition of Instrument-Tissue Interactions in Endoscopic Videos via
Action Triplets
- Authors: Chinedu Innocent Nwoye, Cristians Gonzalez, Tong Yu, Pietro Mascagni,
Didier Mutter, Jacques Marescaux and Nicolas Padoy
- Abstract summary: We tackle the recognition of fine-grained activities, modeled as action triplets instrument, verb, target> representing the tool activity.
We introduce a new laparoscopic dataset, CholecT40, consisting of 40 videos from the public dataset Cholec80 in which all frames have been annotated using 128 triplet classes.
- Score: 9.517537672430006
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recognition of surgical activity is an essential component to develop
context-aware decision support for the operating room. In this work, we tackle
the recognition of fine-grained activities, modeled as action triplets
<instrument, verb, target> representing the tool activity. To this end, we
introduce a new laparoscopic dataset, CholecT40, consisting of 40 videos from
the public dataset Cholec80 in which all frames have been annotated using 128
triplet classes. Furthermore, we present an approach to recognize these
triplets directly from the video data. It relies on a module called Class
Activation Guide (CAG), which uses the instrument activation maps to guide the
verb and target recognition. To model the recognition of multiple triplets in
the same frame, we also propose a trainable 3D Interaction Space, which
captures the associations between the triplet components. Finally, we
demonstrate the significance of these contributions via several ablation
studies and comparisons to baselines on CholecT40.
Related papers
- Surgical Triplet Recognition via Diffusion Model [59.50938852117371]
Surgical triplet recognition is an essential building block to enable next-generation context-aware operating rooms.
We propose Difft, a new generative framework for surgical triplet recognition employing the diffusion model.
Experiments on the CholecT45 and CholecT50 datasets show the superiority of the proposed method in achieving a new state-of-the-art performance for surgical triplet recognition.
arXiv Detail & Related papers (2024-06-19T04:43:41Z) - Surgical Action Triplet Detection by Mixed Supervised Learning of
Instrument-Tissue Interactions [5.033722555649178]
Surgical action triplets describe instrument-tissue interactions as (instrument, verb, target) combinations.
This work focuses on surgical action triplet detection, which is challenging but more precise than the traditional triplet recognition task.
We propose MCIT-IG, a two-stage network, that stands for Multi-Class Instrument-aware Transformer-Interaction Graph.
arXiv Detail & Related papers (2023-07-18T18:47:48Z) - Language-free Compositional Action Generation via Decoupling Refinement [67.50452446686725]
We introduce a novel framework to generate compositional actions without reliance on language auxiliaries.
Our approach consists of three main components: Action Coupling, Conditional Action Generation, and Decoupling Refinement.
arXiv Detail & Related papers (2023-07-07T12:00:38Z) - CholecTriplet2022: Show me a tool and tell me the triplet -- an
endoscopic vision challenge for surgical action triplet detection [41.66666272822756]
This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection.
It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool) as the key actors, and the modeling of each tool-activity in the form of instrument, verb, target> triplet.
arXiv Detail & Related papers (2023-02-13T11:53:14Z) - Rendezvous in Time: An Attention-based Temporal Fusion approach for
Surgical Triplet Recognition [5.033722555649178]
One of the recent advances in surgical AI is the recognition of surgical activities as triplets of (instrument, verb, target)
Exploiting the temporal cues from earlier frames would improve the recognition of surgical action triplets from videos.
We propose Rendezvous in Time (RiT) - a deep learning model that extends the state-of-the-art model, Rendezvous, with temporal modeling.
arXiv Detail & Related papers (2022-11-30T13:18:07Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene
Segmentation [58.74791043631219]
We propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance.
We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset.
Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T05:52:23Z) - Rendezvous: Attention Mechanisms for the Recognition of Surgical Action
Triplets in Endoscopic Videos [12.725586100227337]
Action triplet recognition stands out as the only one aiming to provide truly fine-grained and comprehensive information on surgical activities.
We introduce our new model, the Rendezvous (RDV), which recognizes triplets directly from surgical videos by leveraging attention at two different levels.
Our proposed RDV model significantly improves the triplet prediction mAP by over 9% compared to the state-of-the-art methods on this dataset.
arXiv Detail & Related papers (2021-09-07T17:52:52Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Pose And Joint-Aware Action Recognition [87.4780883700755]
We present a new model for joint-based action recognition, which first extracts motion features from each joint separately through a shared motion encoder.
Our joint selector module re-weights the joint information to select the most discriminative joints for the task.
We show large improvements over the current state-of-the-art joint-based approaches on JHMDB, HMDB, Charades, AVA action recognition datasets.
arXiv Detail & Related papers (2020-10-16T04:43:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.