Learning to Estimate Hidden Motions with Global Motion Aggregation
- URL: http://arxiv.org/abs/2104.02409v1
- Date: Tue, 6 Apr 2021 10:32:03 GMT
- Title: Learning to Estimate Hidden Motions with Global Motion Aggregation
- Authors: Shihao Jiang, Dylan Campbell, Yao Lu, Hongdong Li, Richard Hartley
- Abstract summary: Occlusions pose a significant challenge to optical flow algorithms that rely on local evidences.
We introduce a global motion aggregation module to find long-range dependencies between pixels in the first image.
We demonstrate that the optical flow estimates in the occluded regions can be significantly improved without damaging the performance in non-occluded regions.
- Score: 71.12650817490318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Occlusions pose a significant challenge to optical flow algorithms that rely
on local evidences. We consider an occluded point to be one that is imaged in
the first frame but not in the next, a slight overloading of the standard
definition since it also includes points that move out-of-frame. Estimating the
motion of these points is extremely difficult, particularly in the two-frame
setting. Previous work relies on CNNs to learn occlusions, without much
success, or requires multiple frames to reason about occlusions using temporal
smoothness. In this paper, we argue that the occlusion problem can be better
solved in the two-frame case by modelling image self-similarities. We introduce
a global motion aggregation module, a transformer-based approach to find
long-range dependencies between pixels in the first image, and perform global
aggregation on the corresponding motion features. We demonstrate that the
optical flow estimates in the occluded regions can be significantly improved
without damaging the performance in non-occluded regions. This approach obtains
new state-of-the-art results on the challenging Sintel dataset, improving the
average end-point error by 13.6\% on Sintel Final and 13.7\% on Sintel Clean.
At the time of submission, our method ranks first on these benchmarks among all
published and unpublished approaches. Code is available at
https://github.com/zacjiang/GMA .
Related papers
- Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion.
Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z) - Vanishing Point Estimation in Uncalibrated Images with Prior Gravity
Direction [82.72686460985297]
We tackle the problem of estimating a Manhattan frame.
We derive two new 2-line solvers, one of which does not suffer from singularities affecting existing solvers.
We also design a new non-minimal method, running on an arbitrary number of lines, to boost the performance in local optimization.
arXiv Detail & Related papers (2023-08-21T13:03:25Z) - Towards Nonlinear-Motion-Aware and Occlusion-Robust Rolling Shutter
Correction [54.00007868515432]
Existing methods face challenges in estimating the accurate correction field due to the uniform velocity assumption.
We propose a geometry-based Quadratic Rolling Shutter (QRS) motion solver, which precisely estimates the high-order correction field of individual pixels.
Our method surpasses the state-of-the-art by +4.98, +0.77, and +4.33 of PSNR on Carla-RS, Fastec-RS, and BS-RSC datasets, respectively.
arXiv Detail & Related papers (2023-03-31T15:09:18Z) - OTPose: Occlusion-Aware Transformer for Pose Estimation in
Sparsely-Labeled Videos [21.893572076171527]
We propose a method that leverages an attention mask for occluded joints and encodes temporal dependency between frames using transformers.
We achieve state-of-the-art pose estimation results for PoseTrack 2017 and PoseTrack 2018 datasets.
arXiv Detail & Related papers (2022-07-20T08:06:06Z) - Temporal Feature Alignment and Mutual Information Maximization for
Video-Based Human Pose Estimation [38.571715193347366]
We present a novel hierarchical alignment framework for multi-frame human pose estimation.
We rank No.1 in the Multi-frame Person Pose Estimation Challenge on benchmark dataset PoseTrack 2017, and obtain state-of-the-art performance on benchmarks Sub-JHMDB and Pose-Track 2018.
arXiv Detail & Related papers (2022-03-29T04:29:16Z) - IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding
Alignment [58.8330387551499]
We formulate the problem as estimation of point-wise trajectories (i.e., smooth curves)
We propose IDEA-Net, an end-to-end deep learning framework, which disentangles the problem under the assistance of the explicitly learned temporal consistency.
We demonstrate the effectiveness of our method on various point cloud sequences and observe large improvement over state-of-the-art methods both quantitatively and visually.
arXiv Detail & Related papers (2022-03-22T10:14:08Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z) - Deep Dual Consecutive Network for Human Pose Estimation [44.41818683253614]
We propose a novel multi-frame human pose estimation framework, leveraging abundant temporal cues between video frames to facilitate keypoint detection.
Our method ranks No.1 in the Multi-frame Person Pose Challenge Challenge on the large-scale benchmark datasets PoseTrack 2017 and PoseTrack 2018.
arXiv Detail & Related papers (2021-03-12T13:11:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.