Weakly Supervised Temporal Convolutional Networks for Fine-grained
Surgical Activity Recognition
- URL: http://arxiv.org/abs/2302.10834v2
- Date: Tue, 11 Apr 2023 13:55:28 GMT
- Title: Weakly Supervised Temporal Convolutional Networks for Fine-grained
Surgical Activity Recognition
- Authors: Sanat Ramesh, Diego Dall'Alba, Cristians Gonzalez, Tong Yu, Pietro
Mascagni, Didier Mutter, Jacques Marescaux, Paolo Fiorini, and Nicolas Padoy
- Abstract summary: We propose to use coarser and easier-to-annotate activity labels, namely phases, as weak supervision to learn step recognition.
We employ a Single-Stage Temporal Convolutional Network (SS-TCN) with a ResNet-50 backbone, trained in an end-to-end fashion from weakly annotated videos.
We extensively evaluate and show the effectiveness of the proposed method on a large video dataset consisting of 40 laparoscopic gastric bypass procedures and the public benchmark CATARACTS containing 50 cataract surgeries.
- Score: 10.080444283496487
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Automatic recognition of fine-grained surgical activities, called steps, is a
challenging but crucial task for intelligent intra-operative computer
assistance. The development of current vision-based activity recognition
methods relies heavily on a high volume of manually annotated data. This data
is difficult and time-consuming to generate and requires domain-specific
knowledge. In this work, we propose to use coarser and easier-to-annotate
activity labels, namely phases, as weak supervision to learn step recognition
with fewer step annotated videos. We introduce a step-phase dependency loss to
exploit the weak supervision signal. We then employ a Single-Stage Temporal
Convolutional Network (SS-TCN) with a ResNet-50 backbone, trained in an
end-to-end fashion from weakly annotated videos, for temporal activity
segmentation and recognition. We extensively evaluate and show the
effectiveness of the proposed method on a large video dataset consisting of 40
laparoscopic gastric bypass procedures and the public benchmark CATARACTS
containing 50 cataract surgeries.
Related papers
- Robust Surgical Phase Recognition From Annotation Efficient Supervision [1.1510009152620668]
We propose a robust method for surgical phase recognition that can handle missing phase annotations effectively.
We achieve an accuracy of 85.1% on the MultiBypass140 dataset using only 3 annotated frames per video.
Our work contributes to the advancement of surgical workflow recognition and paves the way for more efficient and reliable surgical phase recognition systems.
arXiv Detail & Related papers (2024-06-26T16:47:31Z) - Weakly-Supervised Surgical Phase Recognition [19.27227976291303]
In this work we join concepts of graph segmentation with self-supervised learning to derive a random-walk solution for per-frame phase prediction.
We validate our method by running experiments with the public Cholec80 dataset of laparoscopic cholecystectomy videos.
arXiv Detail & Related papers (2023-10-26T07:54:47Z) - NurViD: A Large Expert-Level Video Database for Nursing Procedure
Activity Understanding [20.273197899025117]
We propose NurViD, a large video dataset with expert-level annotation for nursing procedure activity understanding.
NurViD consists of over 1.5k videos totaling 144 hours, making it approximately four times longer than the existing largest nursing activity datasets.
arXiv Detail & Related papers (2023-10-20T08:22:56Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community.
The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored.
We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z) - Federated Cycling (FedCy): Semi-supervised Federated Learning of
Surgical Phases [57.90226879210227]
FedCy is a semi-supervised learning (FSSL) method that combines FL and self-supervised learning to exploit a decentralized dataset of both labeled and unlabeled videos.
We demonstrate significant performance gains over state-of-the-art FSSL methods on the task of automatic recognition of surgical phases.
arXiv Detail & Related papers (2022-03-14T17:44:53Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - LRTD: Long-Range Temporal Dependency based Active Learning for Surgical
Workflow Recognition [67.86810761677403]
We propose a novel active learning method for cost-effective surgical video analysis.
Specifically, we propose a non-local recurrent convolutional network (NL-RCNet), which introduces non-local block to capture the long-range temporal dependency.
We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task.
arXiv Detail & Related papers (2020-04-21T09:21:22Z) - ZSTAD: Zero-Shot Temporal Activity Detection [107.63759089583382]
We propose a novel task setting called zero-shot temporal activity detection (ZSTAD), where activities that have never been seen in training can still be detected.
We design an end-to-end deep network based on R-C3D as the architecture for this solution.
Experiments on both the THUMOS14 and the Charades datasets show promising performance in terms of detecting unseen activities.
arXiv Detail & Related papers (2020-03-12T02:40:36Z) - Multi-Task Recurrent Neural Network for Surgical Gesture Recognition and
Progress Prediction [17.63619129438996]
We propose a multi-task recurrent neural network for simultaneous recognition of surgical gestures and estimation of a novel formulation of surgical task progress.
We demonstrate that recognition performance improves in multi-task frameworks with progress estimation without any additional manual labelling and training.
arXiv Detail & Related papers (2020-03-10T14:28:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.