NurViD: A Large Expert-Level Video Database for Nursing Procedure
Activity Understanding
- URL: http://arxiv.org/abs/2310.13347v1
- Date: Fri, 20 Oct 2023 08:22:56 GMT
- Title: NurViD: A Large Expert-Level Video Database for Nursing Procedure
Activity Understanding
- Authors: Ming Hu, Lin Wang, Siyuan Yan, Don Ma, Qingli Ren, Peng Xia, Wei Feng,
Peibo Duan, Lie Ju, Zongyuan Ge
- Abstract summary: We propose NurViD, a large video dataset with expert-level annotation for nursing procedure activity understanding.
NurViD consists of over 1.5k videos totaling 144 hours, making it approximately four times longer than the existing largest nursing activity datasets.
- Score: 20.273197899025117
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The application of deep learning to nursing procedure activity understanding
has the potential to greatly enhance the quality and safety of nurse-patient
interactions. By utilizing the technique, we can facilitate training and
education, improve quality control, and enable operational compliance
monitoring. However, the development of automatic recognition systems in this
field is currently hindered by the scarcity of appropriately labeled datasets.
The existing video datasets pose several limitations: 1) these datasets are
small-scale in size to support comprehensive investigations of nursing
activity; 2) they primarily focus on single procedures, lacking expert-level
annotations for various nursing procedures and action steps; and 3) they lack
temporally localized annotations, which prevents the effective localization of
targeted actions within longer video sequences. To mitigate these limitations,
we propose NurViD, a large video dataset with expert-level annotation for
nursing procedure activity understanding. NurViD consists of over 1.5k videos
totaling 144 hours, making it approximately four times longer than the existing
largest nursing activity datasets. Notably, it encompasses 51 distinct nursing
procedures and 177 action steps, providing a much more comprehensive coverage
compared to existing datasets that primarily focus on limited procedures. To
evaluate the efficacy of current deep learning methods on nursing activity
understanding, we establish three benchmarks on NurViD: procedure recognition
on untrimmed videos, procedure and action recognition on trimmed videos, and
action detection. Our benchmark and code will be available at
\url{https://github.com/minghu0830/NurViD-benchmark}.
Related papers
- Shifting to Machine Supervision: Annotation-Efficient Semi and Self-Supervised Learning for Automatic Medical Image Segmentation and Classification [9.67209046726903]
We introduce the S4MI pipeline, a novel approach that leverages advancements in self-supervised and semi-supervised learning.
Our study benchmarks these techniques on three distinct medical imaging datasets to evaluate their effectiveness in classification and segmentation tasks.
Remarkably, the semi-supervised approach demonstrated superior outcomes in segmentation, outperforming fully-supervised methods while using 50% fewer labels across all datasets.
arXiv Detail & Related papers (2023-11-17T04:04:29Z) - Weakly-Supervised Surgical Phase Recognition [19.27227976291303]
In this work we join concepts of graph segmentation with self-supervised learning to derive a random-walk solution for per-frame phase prediction.
We validate our method by running experiments with the public Cholec80 dataset of laparoscopic cholecystectomy videos.
arXiv Detail & Related papers (2023-10-26T07:54:47Z) - Video object detection for privacy-preserving patient monitoring in
intensive care [0.0]
We propose a new method for exploiting information in the temporal succession of video frames.
Our method outperforms a standard YOLOv5 baseline model by +1.7% mAP@.5 while also training over ten times faster on our proprietary dataset.
arXiv Detail & Related papers (2023-06-26T11:52:22Z) - Procedure-Aware Pretraining for Instructional Video Understanding [58.214549181779006]
Key challenge in procedure understanding is to be able to extract from unlabeled videos the procedural knowledge.
Our main insight is that instructional videos depict sequences of steps that repeat between instances of the same or different tasks.
This graph can then be used to generate pseudo labels to train a video representation that encodes the procedural knowledge in a more accessible form.
arXiv Detail & Related papers (2023-03-31T17:41:31Z) - Weakly Supervised Temporal Convolutional Networks for Fine-grained
Surgical Activity Recognition [10.080444283496487]
We propose to use coarser and easier-to-annotate activity labels, namely phases, as weak supervision to learn step recognition.
We employ a Single-Stage Temporal Convolutional Network (SS-TCN) with a ResNet-50 backbone, trained in an end-to-end fashion from weakly annotated videos.
We extensively evaluate and show the effectiveness of the proposed method on a large video dataset consisting of 40 laparoscopic gastric bypass procedures and the public benchmark CATARACTS containing 50 cataract surgeries.
arXiv Detail & Related papers (2023-02-21T17:26:49Z) - Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community.
The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored.
We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z) - Federated Cycling (FedCy): Semi-supervised Federated Learning of
Surgical Phases [57.90226879210227]
FedCy is a semi-supervised learning (FSSL) method that combines FL and self-supervised learning to exploit a decentralized dataset of both labeled and unlabeled videos.
We demonstrate significant performance gains over state-of-the-art FSSL methods on the task of automatic recognition of surgical phases.
arXiv Detail & Related papers (2022-03-14T17:44:53Z) - Learning To Recognize Procedural Activities with Distant Supervision [96.58436002052466]
We consider the problem of classifying fine-grained, multi-step activities from long videos spanning up to several minutes.
Our method uses a language model to match noisy, automatically-transcribed speech from the video to step descriptions in the knowledge base.
arXiv Detail & Related papers (2022-01-26T15:06:28Z) - Intra- and Inter-Action Understanding via Temporal Action Parsing [118.32912239230272]
We construct a new dataset developed on sport videos with manual annotations of sub-actions, and conduct a study on temporal action parsing on top.
Our study shows that a sport activity usually consists of multiple sub-actions and that the awareness of such temporal structures is beneficial to action recognition.
We also investigate a number of temporal parsing methods, and thereon devise an improved method that is capable of mining sub-actions from training data without knowing the labels of them.
arXiv Detail & Related papers (2020-05-20T17:45:18Z) - LRTD: Long-Range Temporal Dependency based Active Learning for Surgical
Workflow Recognition [67.86810761677403]
We propose a novel active learning method for cost-effective surgical video analysis.
Specifically, we propose a non-local recurrent convolutional network (NL-RCNet), which introduces non-local block to capture the long-range temporal dependency.
We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task.
arXiv Detail & Related papers (2020-04-21T09:21:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.