SiamParseNet: Joint Body Parsing and Label Propagation in Infant
Movement Videos
- URL: http://arxiv.org/abs/2007.08646v1
- Date: Thu, 16 Jul 2020 21:14:25 GMT
- Title: SiamParseNet: Joint Body Parsing and Label Propagation in Infant
Movement Videos
- Authors: Haomiao Ni, Yuan Xue, Qian Zhang, Xiaolei Huang
- Abstract summary: General movement assessment (GMA) of infant movement videos (IMVs) is an effective method for the early detection of cerebral palsy (CP) in infants.
We propose a semi-supervised body parsing model, termed SiamParseNet (SPN), to jointly learn single frame body parsing and label propagation between frames in a semi-supervised fashion.
- Score: 12.99371655893686
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: General movement assessment (GMA) of infant movement videos (IMVs) is an
effective method for the early detection of cerebral palsy (CP) in infants.
Automated body parsing is a crucial step towards computer-aided GMA, in which
infant body parts are segmented and tracked over time for movement analysis.
However, acquiring fully annotated data for video-based body parsing is
particularly expensive due to the large number of frames in IMVs. In this
paper, we propose a semi-supervised body parsing model, termed SiamParseNet
(SPN), to jointly learn single frame body parsing and label propagation between
frames in a semi-supervised fashion. The Siamese-structured SPN consists of a
shared feature encoder, followed by two separate branches: one for intra-frame
body parts segmentation, and one for inter-frame label propagation. The two
branches are trained jointly, taking pairs of frames from the same videos as
their input. An adaptive training process is proposed that alternates training
modes between using input pairs of only labeled frames and using inputs of both
labeled and unlabeled frames. During testing, we employ a multi-source
inference mechanism, where the final result for a test frame is either obtained
via the segmentation branch or via propagation from a nearby key frame. We
conduct extensive experiments on a partially-labeled IMV dataset where SPN
outperforms all prior arts, demonstrating the effectiveness of our proposed
method.
Related papers
- Dual Prototype Attention for Unsupervised Video Object Segmentation [28.725754274542304]
Unsupervised video object segmentation (VOS) aims to detect and segment the most salient object in videos.
This paper proposes two novel prototype-based attention mechanisms, inter-modality attention (IMA) and inter-frame attention (IFA)
arXiv Detail & Related papers (2022-11-22T06:19:17Z) - Semi-supervised Body Parsing and Pose Estimation for Enhancing Infant
General Movement Assessment [11.33138866472943]
General movement assessment (GMA) of infant movement videos (IMVs) is an effective method for early detection of cerebral palsy (CP) in infants.
We demonstrate in this paper that end-to-end trainable neural networks for image sequence recognition can be applied to achieve good results in GMA.
We propose a semi-supervised model, termed SiamParseNet (SPN), which consists of two branches, one for intra-frame body parts segmentation and another for inter-frame label propagation.
arXiv Detail & Related papers (2022-10-14T18:46:30Z) - Retrieval of surgical phase transitions using reinforcement learning [11.130363429095048]
We introduce a novel reinforcement learning formulation for offline phase transition retrieval.
By construction, our model does not produce spurious and noisy phase transitions, but contiguous phase blocks.
We compare our method against the recent top-performing frame-based approaches TeCNO and Trans-SVNet.
arXiv Detail & Related papers (2022-08-01T14:43:15Z) - Skimming, Locating, then Perusing: A Human-Like Framework for Natural
Language Video Localization [19.46938403691984]
We propose a two-step human-like framework called Skimming-Locating-Perusing.
SLP consists of a Skimming-and-Locating (SL) module and a Bi-directional Perusing (BP) module.
Our SLP is superior to the state-of-the-art methods and localizes more precise segment boundaries.
arXiv Detail & Related papers (2022-07-27T10:59:33Z) - TTVFI: Learning Trajectory-Aware Transformer for Video Frame
Interpolation [50.49396123016185]
Video frame (VFI) aims to synthesize an intermediate frame between two consecutive frames.
We propose a novel Trajectory-aware Transformer for Video Frame Interpolation (TTVFI)
Our method outperforms other state-of-the-art methods in four widely-used VFI benchmarks.
arXiv Detail & Related papers (2022-07-19T03:37:49Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - Part-level Action Parsing via a Pose-guided Coarse-to-Fine Framework [108.70949305791201]
Part-level Action Parsing (PAP) aims to not only predict the video-level action but also recognize the frame-level fine-grained actions or interactions of body parts for each person in the video.
In particular, our framework first predicts the video-level class of the input video, then localizes the body parts and predicts the part-level action.
Our framework achieves state-of-the-art performance and outperforms existing methods over a 31.10% ROC score.
arXiv Detail & Related papers (2022-03-09T01:30:57Z) - Contrastive Transformation for Self-supervised Correspondence Learning [120.62547360463923]
We study the self-supervised learning of visual correspondence using unlabeled videos in the wild.
Our method simultaneously considers intra- and inter-video representation associations for reliable correspondence estimation.
Our framework outperforms the recent self-supervised correspondence methods on a range of visual tasks.
arXiv Detail & Related papers (2020-12-09T14:05:06Z) - Learning Motion Flows for Semi-supervised Instrument Segmentation from
Robotic Surgical Video [64.44583693846751]
We study the semi-supervised instrument segmentation from robotic surgical videos with sparse annotations.
By exploiting generated data pairs, our framework can recover and even enhance temporal consistency of training sequences.
Results show that our method outperforms the state-of-the-art semisupervised methods by a large margin.
arXiv Detail & Related papers (2020-07-06T02:39:32Z) - SF-Net: Single-Frame Supervision for Temporal Action Localization [60.202516362976645]
Single-frame supervision introduces extra temporal action signals while maintaining low annotation overhead.
We propose a unified system called SF-Net to make use of such single-frame supervision.
SF-Net significantly improves upon state-of-the-art weakly-supervised methods in terms of both segment localization and single-frame localization.
arXiv Detail & Related papers (2020-03-15T15:06:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.