Using Human Gaze For Surgical Activity Recognition
- URL: http://arxiv.org/abs/2203.04752v1
- Date: Wed, 9 Mar 2022 14:28:00 GMT
- Title: Using Human Gaze For Surgical Activity Recognition
- Authors: Abdishakour Awale, Duygu Sarikaya
- Abstract summary: We propose to use human gaze with a spatial temporal attention mechanism for activity recognition in surgical videos.
Our model consists of an I3D-based architecture, learns temporal features using 3D convolutions, as well as learning an attention map using human gaze.
- Score: 0.40611352512781856
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatically recognizing surgical activities plays an important role in
providing feedback to surgeons, and is a fundamental step towards
computer-aided surgical systems. Human gaze and visual saliency carry important
information about visual attention, and can be used in computer vision systems.
Although state-of-the-art surgical activity recognition models learn spatial
temporal features, none of these models make use of human gaze and visual
saliency. In this study, we propose to use human gaze with a spatial temporal
attention mechanism for activity recognition in surgical videos. Our model
consists of an I3D-based architecture, learns spatio-temporal features using 3D
convolutions, as well as learning an attention map using human gaze. We
evaluated our model on the Suturing task of JIGSAWS which is a publicly
available surgical video understanding dataset. Our evaluations on a subset of
random video segments in this task suggest that our approach achieves promising
results with an accuracy of 86.2%.
Related papers
- Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose
Estimation of Surgical Instruments [66.74633676595889]
We present a multi-camera capture setup consisting of static and head-mounted cameras.
Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre.
Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments.
arXiv Detail & Related papers (2023-05-05T13:42:19Z) - Using Hand Pose Estimation To Automate Open Surgery Training Feedback [0.0]
This research aims to facilitate the use of state-of-the-art computer vision algorithms for the automated training of surgeons.
By estimating 2D hand poses, we model the movement of the practitioner's hands, and their interaction with surgical instruments.
arXiv Detail & Related papers (2022-11-13T21:47:31Z) - Context-Aware Sequence Alignment using 4D Skeletal Augmentation [67.05537307224525]
Temporal alignment of fine-grained human actions in videos is important for numerous applications in computer vision, robotics, and mixed reality.
We propose a novel context-aware self-supervised learning architecture to align sequences of actions.
Specifically, CASA employs self-attention and cross-attention mechanisms to incorporate the spatial and temporal context of human actions.
arXiv Detail & Related papers (2022-04-26T10:59:29Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - A Temporal Learning Approach to Inpainting Endoscopic Specularities and
Its effect on Image Correspondence [13.25903945009516]
We propose using a temporal generative adversarial network (GAN) to inpaint the hidden anatomy under specularities.
This is achieved using in-vivo data of gastric endoscopy (Hyper-Kvasir) in a fully unsupervised manner.
We also assess the effect of our method in computer vision tasks that underpin 3D reconstruction and camera motion estimation.
arXiv Detail & Related papers (2022-03-31T13:14:00Z) - Attention Mechanisms in Computer Vision: A Survey [75.6074182122423]
We provide a comprehensive review of various attention mechanisms in computer vision.
We categorize them according to approach, such as channel attention, spatial attention, temporal attention and branch attention.
We suggest future directions for attention mechanism research.
arXiv Detail & Related papers (2021-11-15T09:18:40Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z) - Using Computer Vision to Automate Hand Detection and Tracking of Surgeon
Movements in Videos of Open Surgery [8.095095522269352]
We leverage advances in computer vision to introduce an automated approach to video analysis of surgical execution.
A state-of-the-art convolutional neural network architecture for object detection was used to detect operating hands in open surgery videos.
Our model's spatial detections of operating hands significantly outperforms the detections achieved using pre-existing hand-detection datasets.
arXiv Detail & Related papers (2020-12-13T03:10:09Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Automatic Operating Room Surgical Activity Recognition for
Robot-Assisted Surgery [1.1033115844630357]
We investigate automatic surgical activity recognition in robot-assisted operations.
We collect the first large-scale dataset including 400 full-length multi-perspective videos.
We densely annotate the videos with 10 most recognized and clinically relevant classes of activities.
arXiv Detail & Related papers (2020-06-29T16:30:31Z) - Towards Generalizable Surgical Activity Recognition Using Spatial
Temporal Graph Convolutional Networks [0.40611352512781856]
We introduce a modality that is robust to scene variation, and that is able to infer part information such as orientational and relative spatial relationships.
The proposed modality is based on spatial temporal graph representations of surgical tools in videos, for surgical activity recognition.
arXiv Detail & Related papers (2020-01-11T09:11:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.