Related papers: Differentiable Event Stream Simulator for Non-Rigid 3D Tracking

Differentiable Event Stream Simulator for Non-Rigid 3D Tracking

URL: http://arxiv.org/abs/2104.15139v1
Date: Fri, 30 Apr 2021 17:58:07 GMT
Title: Differentiable Event Stream Simulator for Non-Rigid 3D Tracking
Authors: Jalees Nehvi and Vladislav Golyanik and Franziska Mueller and Hans-Peter Seidel and Mohamed Elgharib and Christian Theobalt
Abstract summary: Our differentiable simulator enables non-rigid 3D tracking of deformable objects from event streams. We show the effectiveness of our approach for various types of non-rigid objects and compare to existing methods for non-rigid 3D tracking.
Score: 82.56690776283428
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces the first differentiable simulator of event streams, i.e., streams of asynchronous brightness change signals recorded by event cameras. Our differentiable simulator enables non-rigid 3D tracking of deformable objects (such as human hands, isometric surfaces and general watertight meshes) from event streams by leveraging an analysis-by-synthesis principle. So far, event-based tracking and reconstruction of non-rigid objects in 3D, like hands and body, has been either tackled using explicit event trajectories or large-scale datasets. In contrast, our method does not require any such processing or data, and can be readily applied to incoming event streams. We show the effectiveness of our approach for various types of non-rigid objects and compare to existing methods for non-rigid 3D tracking. In our experiments, the proposed energy-based formulations outperform competing RGB-based methods in terms of 3D errors. The source code and the new data are publicly available.

Related papers

PLOT: Pseudo-Labeling via Video Object Tracking for Scalable Monocular 3D Object Detection [35.524943073010675]
Monocular 3D object detection (M3OD) has long faced challenges due to data scarcity caused by high annotation costs and inherent 2D-to-3D ambiguity.<n>We propose a novel pseudo-labeling framework that uses only video data and is more robust to occlusion, without requiring a multi-view setup, additional sensors, camera poses, or domain-specific training.
arXiv Detail & Related papers (2025-07-03T07:46:39Z)
Where Is The Ball: 3D Ball Trajectory Estimation From 2D Monocular Tracking [10.237629959021875]
We present a method for 3D ball trajectory estimation from a 2D tracking sequence.<n>Our method achieves state-of-the-art performance despite training solely on simulated data.<n>Our method can generalize to real-world scenarios with multiple trajectories, opening up a range of applications in sport analysis and virtual replay.
arXiv Detail & Related papers (2025-06-06T05:42:05Z)
IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera [7.515256982860307]
IncEventGS is an incremental 3D Gaussian splatting reconstruction algorithm with a single event camera. We exploit the tracking and mapping paradigm of conventional SLAM pipelines for IncEventGS.
arXiv Detail & Related papers (2024-10-10T16:54:23Z)
Elite-EvGS: Learning Event-based 3D Gaussian Splatting by Distilling Event-to-Video Priors [8.93657924734248]
Event cameras are bio-inspired sensors that output asynchronous and sparse event streams, instead of fixed frames. We propose a novel event-based 3DGS framework, named Elite-EvGS. Our key idea is to distill the prior knowledge from the off-the-shelf event-to-video (E2V) models to effectively reconstruct 3D scenes from events.
arXiv Detail & Related papers (2024-09-20T10:47:52Z)
Inverse Neural Rendering for Explainable Multi-Object Tracking [35.072142773300655]
We recast 3D multi-object tracking from RGB cameras as an emphInverse Rendering (IR) problem. We optimize an image loss over generative latent spaces that inherently disentangle shape and appearance properties. We validate the generalization and scaling capabilities of our method by learning the generative prior exclusively from synthetic data.
arXiv Detail & Related papers (2024-04-18T17:37:53Z)
EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams [59.77837807004765]
This paper introduces a new problem, i.e., 3D human motion capture from an egocentric monocular event camera with a fisheye lens. Event streams have high temporal resolution and provide reliable cues for 3D human motion capture under high-speed human motions and rapidly changing illumination. Our EE3D demonstrates robustness and superior 3D accuracy compared to existing solutions while supporting real-time 3D pose update rates of 140Hz.
arXiv Detail & Related papers (2024-04-12T17:59:47Z)
Exploring Event-based Human Pose Estimation with 3D Event Representations [26.34100847541989]
We introduce two 3D event representations: the Rasterized Event Point Cloud (Ras EPC) and the Decoupled Event Voxel (DEV) The Ras EPC aggregates events within concise temporal slices at identical positions, preserving their 3D attributes along with statistical information, thereby significantly reducing memory and computational demands. Our methods are tested on the DHP19 public dataset, MMHPSD dataset, and our EV-3DPW dataset, with further qualitative validation via a derived driving scene dataset EV-JAAD and an outdoor collection vehicle.
arXiv Detail & Related papers (2023-11-08T10:45:09Z)
Decaf: Monocular Deformation Capture for Face and Hand Interactions [77.75726740605748]
This paper introduces the first method that allows tracking human hands interacting with human faces in 3D from single monocular RGB videos. We model hands as articulated objects inducing non-rigid face deformations during an active interaction. Our method relies on a new hand-face motion and interaction capture dataset with realistic face deformations acquired with a markerless multi-view camera system.
arXiv Detail & Related papers (2023-09-28T17:59:51Z)
Dual Memory Aggregation Network for Event-Based Object Detection with Learnable Representation [79.02808071245634]
Event-based cameras are bio-inspired sensors that capture brightness change of every pixel in an asynchronous manner. Event streams are divided into grids in the x-y-t coordinates for both positive and negative polarity, producing a set of pillars as 3D tensor representation. Long memory is encoded in the hidden state of adaptive convLSTMs while short memory is modeled by computing spatial-temporal correlation between event pillars.
arXiv Detail & Related papers (2023-03-17T12:12:41Z)
Highly Efficient 3D Human Pose Tracking from Events with Spiking Spatiotemporal Transformer [23.15179173446486]
We introduce the first sparse Spiking Neural Networks (SNNs) framework for 3D human pose tracking based solely on events.<n>Our approach eliminates the need to convert sparse data to dense formats or incorporate additional images, thereby fully exploiting the innate sparsity of input events.<n> Empirical experiments demonstrate the superiority of our approach over existing state-of-the-art (SOTA) ANN-based methods, requiring only 19.1% FLOPs and 3.6% cost energy.
arXiv Detail & Related papers (2023-03-16T22:56:12Z)
Lifting Monocular Events to 3D Human Poses [22.699272716854967]
This paper presents a novel 3D human pose estimation approach using a single stream of asynchronous events as input. We propose the first learning-based method for 3D human pose from a single stream of events. Experiments demonstrate that our method achieves solid accuracy, narrowing the performance gap between standard RGB and event-based vision.
arXiv Detail & Related papers (2021-04-21T16:07:12Z)
Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving. We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z)
EventHands: Real-Time Neural 3D Hand Reconstruction from an Event Stream [80.15360180192175]
3D hand pose estimation from monocular videos is a long-standing and challenging problem. We address it for the first time using a single event camera, i.e., an asynchronous vision sensor reacting on brightness changes. Our approach has characteristics previously not demonstrated with a single RGB or depth camera.
arXiv Detail & Related papers (2020-12-11T16:45:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.