LRTD: Long-Range Temporal Dependency based Active Learning for Surgical
Workflow Recognition
- URL: http://arxiv.org/abs/2004.09845v2
- Date: Thu, 23 Apr 2020 05:57:47 GMT
- Title: LRTD: Long-Range Temporal Dependency based Active Learning for Surgical
Workflow Recognition
- Authors: Xueying Shi, Yueming Jin, Qi Dou, Pheng-Ann Heng
- Abstract summary: We propose a novel active learning method for cost-effective surgical video analysis.
Specifically, we propose a non-local recurrent convolutional network (NL-RCNet), which introduces non-local block to capture the long-range temporal dependency.
We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task.
- Score: 67.86810761677403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic surgical workflow recognition in video is an essentially
fundamental yet challenging problem for developing computer-assisted and
robotic-assisted surgery. Existing approaches with deep learning have achieved
remarkable performance on analysis of surgical videos, however, heavily relying
on large-scale labelled datasets. Unfortunately, the annotation is not often
available in abundance, because it requires the domain knowledge of surgeons.
In this paper, we propose a novel active learning method for cost-effective
surgical video analysis. Specifically, we propose a non-local recurrent
convolutional network (NL-RCNet), which introduces non-local block to capture
the long-range temporal dependency (LRTD) among continuous frames. We then
formulate an intra-clip dependency score to represent the overall dependency
within this clip. By ranking scores among clips in unlabelled data pool, we
select the clips with weak dependencies to annotate, which indicates the most
informative ones to better benefit network training. We validate our approach
on a large surgical video dataset (Cholec80) by performing surgical workflow
recognition task. By using our LRTD based selection strategy, we can outperform
other state-of-the-art active learning methods. Using only up to 50% of
samples, our approach can exceed the performance of full-data training.
Related papers
- Efficient Surgical Tool Recognition via HMM-Stabilized Deep Learning [25.146476653453227]
We propose an HMM-stabilized deep learning method for tool presence detection.
A range of experiments confirm that the proposed approaches achieve better performance with lower training and running costs.
These results suggest that popular deep learning approaches with over-complicated model structures may suffer from inefficient utilization of data.
arXiv Detail & Related papers (2024-04-07T15:27:35Z) - Correlation-aware active learning for surgery video segmentation [13.327429312047396]
This work proposes a novel AL strategy for surgery video segmentation, COWAL, COrrelation-aWare Active Learning.
Our approach involves projecting images into a latent space that has been fine-tuned using contrastive learning and then selecting a fixed number of representative images from local clusters of video frames.
We demonstrate the effectiveness of this approach on two video datasets of surgical instruments and three real-world video datasets.
arXiv Detail & Related papers (2023-11-15T09:30:52Z) - VideoSum: A Python Library for Surgical Video Summarization [3.928145224623878]
We propose to summarize surgical videos into storyboards or collages of representative frames to ease visualization, annotation, and processing.
We present videosum, an easy-to-use and open-source Python library to generate storyboards from surgical videos.
arXiv Detail & Related papers (2023-02-15T19:09:34Z) - Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical
Scene Segmentation with Limited Annotations [72.15956198507281]
We propose PGV-CL, a novel pseudo-label guided cross-video contrast learning method to boost scene segmentation.
We extensively evaluate our method on a public robotic surgery dataset EndoVis18 and a public cataract dataset CaDIS.
arXiv Detail & Related papers (2022-07-20T05:42:19Z) - Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community.
The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored.
We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z) - Federated Cycling (FedCy): Semi-supervised Federated Learning of
Surgical Phases [57.90226879210227]
FedCy is a semi-supervised learning (FSSL) method that combines FL and self-supervised learning to exploit a decentralized dataset of both labeled and unlabeled videos.
We demonstrate significant performance gains over state-of-the-art FSSL methods on the task of automatic recognition of surgical phases.
arXiv Detail & Related papers (2022-03-14T17:44:53Z) - Effective semantic segmentation in Cataract Surgery: What matters most? [5.1151054398496685]
Our work proposes neural network design choices that set the state-of-the-art on a challenging public benchmark on cataract surgery, CaDIS.
Our methodology achieves strong performance across three semantic segmentation tasks with increasingly granular surgical tool class sets.
arXiv Detail & Related papers (2021-08-13T08:27:54Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Aggregating Long-Term Context for Learning Laparoscopic and
Robot-Assisted Surgical Workflows [40.48632897750319]
We propose a new temporal network structure that leverages task-specific network representation to collect long-term sufficient statistics.
We demonstrate superior results over existing and novel state-of-the-art segmentation techniques on two laparoscopic cholecystectomy datasets.
arXiv Detail & Related papers (2020-09-01T20:29:14Z) - Confident Coreset for Active Learning in Medical Image Analysis [57.436224561482966]
We propose a novel active learning method, confident coreset, which considers both uncertainty and distribution for effectively selecting informative samples.
By comparative experiments on two medical image analysis tasks, we show that our method outperforms other active learning methods.
arXiv Detail & Related papers (2020-04-05T13:46:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.