Multi-view data capture for dynamic object reconstruction using handheld
augmented reality mobiles
- URL: http://arxiv.org/abs/2103.07883v1
- Date: Sun, 14 Mar 2021 10:26:50 GMT
- Title: Multi-view data capture for dynamic object reconstruction using handheld
augmented reality mobiles
- Authors: M. Bortolon, L. Bazzanella, F. Poiesi
- Abstract summary: We propose a system to capture nearly-synchronous frame streams from multiple and moving handheld mobiles.
Each mobile executes Simultaneous Localisation and Mapping on-board to estimate its pose, and uses a wireless communication channel to send or receive synchronisation triggers.
We show the effectiveness of our system by employing it for 3D skeleton and volumetric reconstructions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a system to capture nearly-synchronous frame streams from multiple
and moving handheld mobiles that is suitable for dynamic object 3D
reconstruction. Each mobile executes Simultaneous Localisation and Mapping
on-board to estimate its pose, and uses a wireless communication channel to
send or receive synchronisation triggers. Our system can harvest frames and
mobile poses in real time using a decentralised triggering strategy and a
data-relay architecture that can be deployed either at the Edge or in the
Cloud. We show the effectiveness of our system by employing it for 3D skeleton
and volumetric reconstructions. Our triggering strategy achieves equal
performance to that of an NTP-based synchronisation approach, but offers higher
flexibility, as it can be adjusted online based on application needs. We
created a challenging new dataset, namely 4DM, that involves six handheld
augmented reality mobiles recording an actor performing sports actions
outdoors. We validate our system on 4DM, analyse its strengths and limitations,
and compare its modules with alternative ones.
Related papers
- UCM: Unifying Camera Control and Memory with Time-aware Positional Encoding Warping for World Models [54.564740558030245]
We present UCM, a novel framework that unifies long-term memory and precise camera control via a time-aware positional encoding warping mechanism.<n>We also introduce a scalable data curation strategy utilizing point-cloud-based rendering to simulate scene revisiting.
arXiv Detail & Related papers (2026-02-26T12:54:46Z) - Mon3tr: Monocular 3D Telepresence with Pre-built Gaussian Avatars as Amortization [16.68162021163563]
Mon3tr is a novel Monocular 3D telepresence framework that integrates 3D Gaussian splatting (3DGS) based parametric human modeling.<n>A single monocular RGB camera is used to capture body motions and facial expressions in real time to drive the 3DGS-based parametric human model.<n>Our method achieves a PSNR of > 28 dB for novel poses, an end-to-end latency of 80 ms, and > 1000x bandwidth reduction compared to point-cloud streaming.
arXiv Detail & Related papers (2026-01-12T13:17:41Z) - UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive Framework [54.337290937468175]
We propose UniMo, an autoregressive model for joint modeling of 2D human videos and 3D human motions within a unified framework.<n>We show that our method simultaneously generates corresponding videos and motions while performing accurate motion capture.
arXiv Detail & Related papers (2025-12-03T16:03:18Z) - SAM4D: Segment Anything in Camera and LiDAR Streams [20.769019263142056]
We present SAM4D, a multi-modal and temporal foundation model for promptable segmentation across camera and LiDAR streams.<n>UMPE is introduced to align camera and LiDAR features in a shared 3D space, enabling seamless cross-modal prompting.<n>We propose Motion-aware Cross-modal Attention Memory, which leverages ego-motion compensation to enhance temporal consistency.
arXiv Detail & Related papers (2025-06-26T17:59:14Z) - FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video [52.33896173943054]
Egocentric motion capture with a head-mounted body-facing stereo camera is crucial for VR and AR applications.
Existing methods rely on synthetic pretraining and struggle to generate smooth and accurate predictions in real-world settings.
We propose FRAME, a simple yet effective architecture that combines device pose and camera feeds for state-of-the-art body pose prediction.
arXiv Detail & Related papers (2025-03-29T14:26:06Z) - Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera [49.82535393220003]
Dyn-HaMR is the first approach to reconstruct 4D global hand motion from monocular videos recorded by dynamic cameras in the wild.
We show that our approach significantly outperforms state-of-the-art methods in terms of 4D global mesh recovery.
This establishes a new benchmark for hand motion reconstruction from monocular video with moving cameras.
arXiv Detail & Related papers (2024-12-17T12:43:10Z) - MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes.
By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes.
We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z) - DISORF: A Distributed Online 3D Reconstruction Framework for Mobile Robots [4.683651138674254]
DISORF is a framework to enable online 3D reconstruction and visualization of scenes captured by resource-constrained mobile robots and edge devices.
We leverage on-device SLAM systems to generate poseds and transmit them to remote servers that can perform high-quality 3D reconstruction and visualization at runtime.
We propose a novel shifted exponential frame sampling method that addresses this challenge for online training.
arXiv Detail & Related papers (2024-03-01T02:19:40Z) - Federated Multi-View Synthesizing for Metaverse [52.59476179535153]
The metaverse is expected to provide immersive entertainment, education, and business applications.
Virtual reality (VR) transmission over wireless networks is data- and computation-intensive.
We have developed a novel multi-view synthesizing framework that can efficiently provide synthesizing, storage, and communication resources for wireless content delivery in the metaverse.
arXiv Detail & Related papers (2023-12-18T13:51:56Z) - TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models [75.20168902300166]
We propose TrackDiffusion, a novel video generation framework affording fine-grained trajectory-conditioned motion control.
A pivotal component of TrackDiffusion is the instance enhancer, which explicitly ensures inter-frame consistency of multiple objects.
generated video sequences by our TrackDiffusion can be used as training data for visual perception models.
arXiv Detail & Related papers (2023-12-01T15:24:38Z) - MOVIN: Real-time Motion Capture using a Single LiDAR [7.3228874258537875]
We present MOVIN, the data-driven generative method for real-time motion capture with global tracking.
Our framework accurately predicts the performer's 3D global information and local joint details.
We implement a real-time application to showcase our method in real-world scenarios.
arXiv Detail & Related papers (2023-09-17T16:04:15Z) - DynaMoN: Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields [71.94156412354054]
We propose Dynamic Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields (DynaMoN)
DynaMoN handles dynamic content for initial camera pose estimation and statics-focused ray sampling for fast and accurate novel-view synthesis.
We extensively evaluate our approach on two real-world dynamic datasets, the TUM RGB-D dataset and the BONN RGB-D Dynamic dataset.
arXiv Detail & Related papers (2023-09-16T08:46:59Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - Digital Twin-Based 3D Map Management for Edge-Assisted Mobile Augmented
Reality [43.92003852614186]
We propose a digital twin (DT)-based approach to 3D map management for edge-assisted mobile augmented reality (MAR)
First, a DT is created for the MAR device, which emulates 3D map management based on predicting subsequent camera frames.
Second, a model-based reinforcement learning (MBRL) algorithm is developed, utilizing the data collected from both the actual and the emulated data to manage the 3D map.
arXiv Detail & Related papers (2023-05-26T01:38:45Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - Motion-aware Dynamic Graph Neural Network for Video Compressive Sensing [14.67994875448175]
Video snapshot imaging (SCI) utilizes a 2D detector to capture sequential video frames and compress them into a single measurement.
Most existing reconstruction methods are incapable of efficiently capturing long-range spatial and temporal dependencies.
We propose a flexible and robust approach based on the graph neural network (GNN) to efficiently model non-local interactions between pixels in space and time regardless of the distance.
arXiv Detail & Related papers (2022-03-01T12:13:46Z) - Multi-view data capture using edge-synchronised mobiles [0.17205106391379021]
New-generation network architectures (e.g. 5G) promise lower latency and larger bandwidth connections supported by powerful edge computing.
We propose a novel and scalable data capture architecture that exploits edge resources to synchronise and harvest frame captures.
We empirically show the benefits of our edge computing unit by analysing latencies and show the quality of 3D reconstruction outputs against an alternative and popular centralised solution.
arXiv Detail & Related papers (2020-05-07T07:13:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.