Encoding Surgical Videos as Latent Spatiotemporal Graphs for Object and
Anatomy-Driven Reasoning
- URL: http://arxiv.org/abs/2312.06829v1
- Date: Mon, 11 Dec 2023 20:42:27 GMT
- Title: Encoding Surgical Videos as Latent Spatiotemporal Graphs for Object and
Anatomy-Driven Reasoning
- Authors: Aditya Murali, Deepak Alapatt, Pietro Mascagni, Armine Vardazaryan,
Alain Garcia, Nariaki Okamoto, Didier Mutter, Nicolas Padoy
- Abstract summary: We use latent graphs to represent a surgical video in terms of the constituent anatomical structures and tools over time.
We introduce a novel graph-editing module that incorporates prior knowledge temporal coherence to correct errors in the graph.
- Score: 2.9724186623561435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, spatiotemporal graphs have emerged as a concise and elegant manner
of representing video clips in an object-centric fashion, and have shown to be
useful for downstream tasks such as action recognition. In this work, we
investigate the use of latent spatiotemporal graphs to represent a surgical
video in terms of the constituent anatomical structures and tools and their
evolving properties over time. To build the graphs, we first predict frame-wise
graphs using a pre-trained model, then add temporal edges between nodes based
on spatial coherence and visual and semantic similarity. Unlike previous
approaches, we incorporate long-term temporal edges in our graphs to better
model the evolution of the surgical scene and increase robustness to temporary
occlusions. We also introduce a novel graph-editing module that incorporates
prior knowledge and temporal coherence to correct errors in the graph, enabling
improved downstream task performance. Using our graph representations, we
evaluate two downstream tasks, critical view of safety prediction and surgical
phase recognition, obtaining strong results that demonstrate the quality and
flexibility of the learned representations. Code is available at
github.com/CAMMA-public/SurgLatentGraph.
Related papers
- SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction [37.86132786212667]
We introduce an end-to-end framework for the generation and optimization of surgical scene graphs.
Our solution outperforms the SOTA on the CATARACTS dataset by 8% accuracy and 10% F1 score in surgical workflow.
arXiv Detail & Related papers (2024-07-29T17:44:34Z) - Graph-Level Embedding for Time-Evolving Graphs [24.194795771873046]
Graph representation learning (also known as network embedding) has been extensively researched with varying levels of granularity.
We present a novel method for temporal graph-level embedding that addresses this gap.
arXiv Detail & Related papers (2023-06-01T01:50:37Z) - Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report
Generation [92.73584302508907]
We propose a knowledge graph with Dynamic structure and nodes to facilitate medical report generation with Contrastive Learning.
In detail, the fundamental structure of our graph is pre-constructed from general knowledge.
Each image feature is integrated with its very own updated graph before being fed into the decoder module for report generation.
arXiv Detail & Related papers (2023-03-18T03:53:43Z) - Spectral Augmentations for Graph Contrastive Learning [50.149996923976836]
Contrastive learning has emerged as a premier method for learning representations with or without supervision.
Recent studies have shown its utility in graph representation learning for pre-training.
We propose a set of well-motivated graph transformation operations to provide a bank of candidates when constructing augmentations for a graph contrastive objective.
arXiv Detail & Related papers (2023-02-06T16:26:29Z) - State of the Art and Potentialities of Graph-level Learning [54.68482109186052]
Graph-level learning has been applied to many tasks including comparison, regression, classification, and more.
Traditional approaches to learning a set of graphs rely on hand-crafted features, such as substructures.
Deep learning has helped graph-level learning adapt to the growing scale of graphs by extracting features automatically and encoding graphs into low-dimensional representations.
arXiv Detail & Related papers (2023-01-14T09:15:49Z) - Self-supervised Representation Learning on Electronic Health Records
with Graph Kernel Infomax [4.133378723518227]
We propose Graph Kernel Infomax, a self-supervised graph kernel learning approach on the graphical representation of EHR.
Unlike the state-of-the-art, we do not change the graph structure to construct augmented views.
Our approach yields performance on clinical downstream tasks that exceeds the state-of-the-art.
arXiv Detail & Related papers (2022-09-01T16:15:08Z) - Self-Supervised Dynamic Graph Representation Learning via Temporal
Subgraph Contrast [0.8379286663107846]
This paper proposes a self-supervised dynamic graph representation learning framework (DySubC)
DySubC defines a temporal subgraph contrastive learning task to simultaneously learn the structural and evolutional features of a dynamic graph.
Experiments on five real-world datasets demonstrate that DySubC performs better than the related baselines.
arXiv Detail & Related papers (2021-12-16T09:35:34Z) - Joint Graph Learning and Matching for Semantic Feature Correspondence [69.71998282148762]
We propose a joint emphgraph learning and matching network, named GLAM, to explore reliable graph structures for boosting graph matching.
The proposed method is evaluated on three popular visual matching benchmarks (Pascal VOC, Willow Object and SPair-71k)
It outperforms previous state-of-the-art graph matching methods by significant margins on all benchmarks.
arXiv Detail & Related papers (2021-09-01T08:24:02Z) - Temporal Contrastive Graph Learning for Video Action Recognition and
Retrieval [83.56444443849679]
This work takes advantage of the temporal dependencies within videos and proposes a novel self-supervised method named Temporal Contrastive Graph Learning (TCGL)
Our TCGL roots in a hybrid graph contrastive learning strategy to jointly regard the inter-snippet and intra-snippet temporal dependencies as self-supervision signals for temporal representation learning.
Experimental results demonstrate the superiority of our TCGL over the state-of-the-art methods on large-scale action recognition and video retrieval benchmarks.
arXiv Detail & Related papers (2021-01-04T08:11:39Z) - GraphOpt: Learning Optimization Models of Graph Formation [72.75384705298303]
We propose an end-to-end framework that learns an implicit model of graph structure formation and discovers an underlying optimization mechanism.
The learned objective can serve as an explanation for the observed graph properties, thereby lending itself to transfer across different graphs within a domain.
GraphOpt poses link formation in graphs as a sequential decision-making process and solves it using maximum entropy inverse reinforcement learning algorithm.
arXiv Detail & Related papers (2020-07-07T16:51:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.