A Temporal Densely Connected Recurrent Network for Event-based Human
Pose Estimation
- URL: http://arxiv.org/abs/2209.07034v3
- Date: Thu, 6 Apr 2023 02:24:10 GMT
- Title: A Temporal Densely Connected Recurrent Network for Event-based Human
Pose Estimation
- Authors: Zhanpeng Shao, Wen Zhou, Wuzhen Wang, Jianyu Yang, Youfu Li
- Abstract summary: Event camera is an emerging bio-inspired vision sensors that report per-pixel brightness changes asynchronously.
This paper proposes a novel densely connected recurrent architecture to address the problem of incomplete information.
By this recurrent architecture, we can explicitly model not only the sequential but also non-sequential geometric consistency across time steps.
- Score: 24.367222637492787
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Event camera is an emerging bio-inspired vision sensors that report per-pixel
brightness changes asynchronously. It holds noticeable advantage of high
dynamic range, high speed response, and low power budget that enable it to best
capture local motions in uncontrolled environments. This motivates us to unlock
the potential of event cameras for human pose estimation, as the human pose
estimation with event cameras is rarely explored. Due to the novel paradigm
shift from conventional frame-based cameras, however, event signals in a time
interval contain very limited information, as event cameras can only capture
the moving body parts and ignores those static body parts, resulting in some
parts to be incomplete or even disappeared in the time interval. This paper
proposes a novel densely connected recurrent architecture to address the
problem of incomplete information. By this recurrent architecture, we can
explicitly model not only the sequential but also non-sequential geometric
consistency across time steps to accumulate information from previous frames to
recover the entire human bodies, achieving a stable and accurate human pose
estimation from event data. Moreover, to better evaluate our model, we collect
a large scale multimodal event-based dataset that comes with human pose
annotations, which is by far the most challenging one to the best of our
knowledge. The experimental results on two public datasets and our own dataset
demonstrate the effectiveness and strength of our approach. Code can be
available online for facilitating the future research.
Related papers
- VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques.
We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step.
Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z) - Event-based Simultaneous Localization and Mapping: A Comprehensive Survey [52.73728442921428]
Review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams for localization and mapping tasks.
Paper categorizes event-based vSLAM methods into four main categories: feature-based, direct, motion-compensation, and deep learning methods.
arXiv Detail & Related papers (2023-04-19T16:21:14Z) - Event-based Human Pose Tracking by Spiking Spatiotemporal Transformer [20.188995900488717]
We present a dedicated end-to-end sparse deep approach for event-based pose tracking.
This is the first time that 3D human pose tracking is obtained from events only.
Our approach also achieves a significant reduction of 80% in FLOPS.
arXiv Detail & Related papers (2023-03-16T22:56:12Z) - Mutual Information-Based Temporal Difference Learning for Human Pose
Estimation in Video [16.32910684198013]
We present a novel multi-frame human pose estimation framework, which employs temporal differences across frames to model dynamic contexts.
To be specific, we design a multi-stage entangled learning sequences conditioned on multi-stage differences to derive informative motion representation sequences.
These place us to rank No.1 in the Crowd Pose Estimation in Complex Events Challenge on benchmark HiEve.
arXiv Detail & Related papers (2023-03-15T09:29:03Z) - Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks [55.81577205593956]
Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously.
Deep learning (DL) has been brought to this emerging field and inspired active research endeavors in mining its potential.
arXiv Detail & Related papers (2023-02-17T14:19:28Z) - Learning Dynamics via Graph Neural Networks for Human Pose Estimation
and Tracking [98.91894395941766]
We propose a novel online approach to learning the pose dynamics, which are independent of pose detections in current fame.
Specifically, we derive this prediction of dynamics through a graph neural network(GNN) that explicitly accounts for both spatial-temporal and visual information.
Experiments on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed method achieves results superior to the state of the art on both human pose estimation and tracking tasks.
arXiv Detail & Related papers (2021-06-07T16:36:50Z) - Event-based Motion Segmentation with Spatio-Temporal Graph Cuts [51.17064599766138]
We have developed a method to identify independently objects acquired with an event-based camera.
The method performs on par or better than the state of the art without having to predetermine the number of expected moving objects.
arXiv Detail & Related papers (2020-12-16T04:06:02Z) - Learning Monocular Dense Depth from Events [53.078665310545745]
Event cameras produce brightness changes in the form of a stream of asynchronous events instead of intensity frames.
Recent learning-based approaches have been applied to event-based data, such as monocular depth prediction.
We propose a recurrent architecture to solve this task and show significant improvement over standard feed-forward methods.
arXiv Detail & Related papers (2020-10-16T12:36:23Z) - Learning to Detect Objects with a 1 Megapixel Event Camera [14.949946376335305]
Event cameras encode visual information with high temporal precision, low data-rate, and high-dynamic range.
Due to the novelty of the field, the performance of event-based systems on many vision tasks is still lower compared to conventional frame-based solutions.
arXiv Detail & Related papers (2020-09-28T16:03:59Z) - Back to Event Basics: Self-Supervised Learning of Image Reconstruction
for Event Cameras via Photometric Constancy [0.0]
Event cameras are novel vision sensors that sample, in an asynchronous fashion, brightness increments with low latency and high temporal resolution.
We propose a novel, lightweight neural network for optical flow estimation that achieves high speed inference with only a minor drop in performance.
Results across multiple datasets show that the performance of the proposed self-supervised approach is in line with the state-of-the-art.
arXiv Detail & Related papers (2020-09-17T13:30:05Z) - End-to-end Learning of Object Motion Estimation from Retinal Events for
Event-based Object Tracking [35.95703377642108]
We propose a novel deep neural network to learn and regress a parametric object-level motion/transform model for event-based object tracking.
To achieve this goal, we propose a synchronous Time-Surface with Linear Time Decay representation.
We feed the sequence of TSLTD frames to a novel Retinal Motion Regression Network (RMRNet) perform to an end-to-end 5-DoF object motion regression.
arXiv Detail & Related papers (2020-02-14T08:19:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.