Active Annotation of Informative Overlapping Frames in Video Mosaicking
Applications
- URL: http://arxiv.org/abs/2012.15343v1
- Date: Wed, 30 Dec 2020 22:19:19 GMT
- Title: Active Annotation of Informative Overlapping Frames in Video Mosaicking
Applications
- Authors: Loic Peter, Marcel Tella-Amo, Dzhoshkun Ismail Shakir, Jan Deprest,
Sebastien Ourselin, Juan Eugenio Iglesias, Tom Vercauteren
- Abstract summary: We introduce an efficient framework for the active annotation of long-range pairwise correspondences in a sequence.
Our framework suggests pairs of images that are sought to be informative to an oracle agent.
In addition to the efficient construction of a mosaic, our framework provides, as a by-product, ground truth landmark correspondences.
- Score: 3.5544725140884936
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video mosaicking requires the registration of overlapping frames located at
distant timepoints in the sequence to ensure global consistency of the
reconstructed scene. However, fully automated registration of such long-range
pairs is (i) challenging when the registration of images itself is difficult;
and (ii) computationally expensive for long sequences due to the large number
of candidate pairs for registration. In this paper, we introduce an efficient
framework for the active annotation of long-range pairwise correspondences in a
sequence. Our framework suggests pairs of images that are sought to be
informative to an oracle agent (e.g., a human user, or a reliable matching
algorithm) who provides visual correspondences on each suggested pair.
Informative pairs are retrieved according to an iterative strategy based on a
principled annotation reward coupled with two complementary and online
adaptable models of frame overlap. In addition to the efficient construction of
a mosaic, our framework provides, as a by-product, ground truth landmark
correspondences which can be used for evaluation or learning purposes. We
evaluate our approach in both automated and interactive scenarios via
experiments on synthetic sequences, on a publicly available dataset for aerial
imaging and on a clinical dataset for placenta mosaicking during fetal surgery.
Related papers
- One registration is worth two segmentations [12.163299991979574]
The goal of image registration is to establish spatial correspondence between two or more images.
We propose an alternative but more intuitive correspondence representation: a set of corresponding regions-of-interest (ROI) pairs.
We experimentally show that the proposed SAMReg is capable of segmenting and matching multiple ROI pairs.
arXiv Detail & Related papers (2024-05-17T16:14:32Z) - Matching in the Wild: Learning Anatomical Embeddings for Multi-Modality
Images [28.221419419614183]
Radiotherapists require accurate registration of MR/CT images to effectively use information from both modalities.
Recent learning-based methods have shown promising results in the rigid/affine step.
We propose a new approach called Cross-SAM to enable cross-modality matching.
arXiv Detail & Related papers (2023-07-07T11:49:06Z) - Self-Supervised Correspondence Estimation via Multiview Registration [88.99287381176094]
Video provides us with the synchronization-temporal consistency needed for visual learning.
Recent approaches have utilized this signal to learn correspondence estimation from close-by frame pairs.
We propose a self-supervised approach for correspondence estimation that learns from multiview consistency in short RGB-D video sequences.
arXiv Detail & Related papers (2022-12-06T18:59:02Z) - Correspondence Matters for Video Referring Expression Comprehension [64.60046797561455]
Video Referring Expression (REC) aims to localize the referent objects described in the sentence to visual regions in the video frames.
Existing methods suffer from two problems: 1) inconsistent localization results across video frames; 2) confusion between the referent and contextual objects.
We propose a novel Dual Correspondence Network (dubbed as DCNet) which explicitly enhances the dense associations in both the inter-frame and cross-modal manners.
arXiv Detail & Related papers (2022-07-21T10:31:39Z) - HighlightMe: Detecting Highlights from Human-Centric Videos [62.265410865423]
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos.
We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions.
We observe a 4-12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods.
arXiv Detail & Related papers (2021-10-05T01:18:15Z) - Modelling Neighbor Relation in Joint Space-Time Graph for Video
Correspondence Learning [53.74240452117145]
This paper presents a self-supervised method for learning reliable visual correspondence from unlabeled videos.
We formulate the correspondence as finding paths in a joint space-time graph, where nodes are grid patches sampled from frames, and are linked by two types of edges.
Our learned representation outperforms the state-of-the-art self-supervised methods on a variety of visual tasks.
arXiv Detail & Related papers (2021-09-28T05:40:01Z) - Deep Group-wise Variational Diffeomorphic Image Registration [3.0022455491411653]
We propose to extend current learning-based image registration to allow simultaneous registration of multiple images.
We present a general mathematical framework that enables both registration of multiple images to their viscous geodesic average and registration in which any of the available images can be used as a fixed image.
arXiv Detail & Related papers (2020-10-01T07:37:28Z) - MvMM-RegNet: A new image registration framework based on multivariate
mixture model and neural network estimation [14.36896617430302]
We propose a new image registration framework based on generative model (MvMM) and neural network estimation.
A generative model consolidating both appearance and anatomical information is established to derive a novel loss function capable of implementing groupwise registration.
We highlight the versatility of the proposed framework for various applications on multimodal cardiac images.
arXiv Detail & Related papers (2020-06-28T11:19:15Z) - Learning Deformable Image Registration from Optimization: Perspective,
Modules, Bilevel Training and Beyond [62.730497582218284]
We develop a new deep learning based framework to optimize a diffeomorphic model via multi-scale propagation.
We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data.
arXiv Detail & Related papers (2020-04-30T03:23:45Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.