PEg TRAnsfer Workflow recognition challenge report: Does multi-modal
data improve recognition?
- URL: http://arxiv.org/abs/2202.05821v3
- Date: Thu, 27 Apr 2023 13:27:49 GMT
- Title: PEg TRAnsfer Workflow recognition challenge report: Does multi-modal
data improve recognition?
- Authors: Arnaud Huaulm\'e, Kanako Harada, Quang-Minh Nguyen, Bogyu Park,
Seungbum Hong, Min-Kook Choi, Michael Peven, Yunshuang Li, Yonghao Long, Qi
Dou, Satyadwyoom Kumar, Seenivasan Lalithkumar, Ren Hongliang, Hiroki
Matsuzaki, Yuto Ishikawa, Yuriko Harai, Satoshi Kondo, Mamoru Mitsuishi,
Pierre Jannin
- Abstract summary: "PEg TRAnsfert recognition" (PETRAW) challenge was to develop surgical workflow recognition methods based on one or several modalities.
PETRAW challenge provided a data set of 150 peg transfer sequences performed on a virtual simulator.
The improvement between video/kinematic-based methods and the uni-modality ones was significant for all of the teams.
- Score: 14.144188912860892
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper presents the design and results of the "PEg TRAnsfert Workflow
recognition" (PETRAW) challenge whose objective was to develop surgical
workflow recognition methods based on one or several modalities, among video,
kinematic, and segmentation data, in order to study their added value. The
PETRAW challenge provided a data set of 150 peg transfer sequences performed on
a virtual simulator. This data set was composed of videos, kinematics, semantic
segmentation, and workflow annotations which described the sequences at three
different granularity levels: phase, step, and activity. Five tasks were
proposed to the participants: three of them were related to the recognition of
all granularities with one of the available modalities, while the others
addressed the recognition with a combination of modalities. Average
application-dependent balanced accuracy (AD-Accuracy) was used as evaluation
metric to take unbalanced classes into account and because it is more
clinically relevant than a frame-by-frame score. Seven teams participated in at
least one task and four of them in all tasks. Best results are obtained with
the use of the video and the kinematics data with an AD-Accuracy between 93%
and 90% for the four teams who participated in all tasks. The improvement
between video/kinematic-based methods and the uni-modality ones was significant
for all of the teams. However, the difference in testing execution time between
the video/kinematic-based and the kinematic-based methods has to be taken into
consideration. Is it relevant to spend 20 to 200 times more computing time for
less than 3% of improvement? The PETRAW data set is publicly available at
www.synapse.org/PETRAW to encourage further research in surgical workflow
recognition.
Related papers
- Multi-Task Consistency for Active Learning [18.794331424921946]
Inconsistency-based active learning has proven to be effective in selecting informative samples for annotation.
We propose a novel multi-task active learning strategy for two coupled vision tasks: object detection and semantic segmentation.
Our approach achieves 95% of the fully-trained performance using only 67% of the available data.
arXiv Detail & Related papers (2023-06-21T17:34:31Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision
Research [96.53307645791179]
We introduce the Never-Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks.
Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, scene recognition, and so forth.
Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks.
arXiv Detail & Related papers (2022-11-15T18:57:46Z) - Multi-dataset Training of Transformers for Robust Action Recognition [75.5695991766902]
We study the task of robust feature representations, aiming to generalize well on multiple datasets for action recognition.
Here, we propose a novel multi-dataset training paradigm, MultiTrain, with the design of two new loss terms, namely informative loss and projection loss.
We verify the effectiveness of our method on five challenging datasets, Kinetics-400, Kinetics-700, Moments-in-Time, Activitynet and Something-something-v2.
arXiv Detail & Related papers (2022-09-26T01:30:43Z) - Uni-Perceiver: Pre-training Unified Architecture for Generic Perception
for Zero-shot and Few-shot Tasks [73.63892022944198]
We present a generic perception architecture named Uni-Perceiver.
It processes a variety of modalities and tasks with unified modeling and shared parameters.
Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks.
arXiv Detail & Related papers (2021-12-02T18:59:50Z) - Rendezvous: Attention Mechanisms for the Recognition of Surgical Action
Triplets in Endoscopic Videos [12.725586100227337]
Action triplet recognition stands out as the only one aiming to provide truly fine-grained and comprehensive information on surgical activities.
We introduce our new model, the Rendezvous (RDV), which recognizes triplets directly from surgical videos by leveraging attention at two different levels.
Our proposed RDV model significantly improves the triplet prediction mAP by over 9% compared to the state-of-the-art methods on this dataset.
arXiv Detail & Related papers (2021-09-07T17:52:52Z) - ASCNet: Self-supervised Video Representation Learning with
Appearance-Speed Consistency [62.38914747727636]
We study self-supervised video representation learning, which is a challenging task due to 1) a lack of labels for explicit supervision and 2) unstructured and noisy visual information.
Existing methods mainly use contrastive loss with video clips as the instances and learn visual representation by discriminating instances from each other.
In this paper, we observe that the consistency between positive samples is the key to learn robust video representations.
arXiv Detail & Related papers (2021-06-04T08:44:50Z) - MIcro-Surgical Anastomose Workflow recognition challenge report [12.252332806968756]
"MIcro-Surgical Anastomose recognition on training sessions" (MISAW) challenge provided a data set of 27 sequences of micro-surgical anastomosis on artificial blood vessels.
This data set was composed of videos, kinematics, and workflow annotations described at three different granularity levels: phase, step, and activity.
The best models achieved more than 95% AD-Accuracy for phase recognition, 80% for step recognition, 60% for activity recognition, and 75% for all granularity levels.
arXiv Detail & Related papers (2021-03-24T11:34:09Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.