Related papers: SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis

SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis

URL: http://arxiv.org/abs/2204.10211v1
Date: Thu, 21 Apr 2022 15:47:38 GMT
Title: SmartPortraits: Depth Powered Handheld Smartphone Dataset of Human Portraits for State Estimation, Reconstruction and Synthesis
Authors: Anastasiia Kornilova, Marsel Faizullin, Konstantin Pakulev, Andrey Sadkov, Denis Kukushkin, Azat Akhmetyanov, Timur Akhtyamov, Hekmat Taherinejad, Gonzalo Ferrer
Abstract summary: We present a dataset of 1000 video sequences of human portraits recorded in real and uncontrolled conditions. The collected dataset contains 200 people captured in different poses and locations. The main purpose is to bridge the gap between raw measurements obtained from a smartphone and downstream applications.
Score: 1.981491298222699
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present a dataset of 1000 video sequences of human portraits recorded in real and uncontrolled conditions by using a handheld smartphone accompanied by an external high-quality depth camera. The collected dataset contains 200 people captured in different poses and locations and its main purpose is to bridge the gap between raw measurements obtained from a smartphone and downstream applications, such as state estimation, 3D reconstruction, view synthesis, etc. The sensors employed in data collection are the smartphone's camera and Inertial Measurement Unit (IMU), and an external Azure Kinect DK depth camera software synchronized with sub-millisecond precision to the smartphone system. During the recording, the smartphone flash is used to provide a periodic secondary source of lightning. Accurate mask of the foremost person is provided as well as its impact on the camera alignment accuracy. For evaluation purposes, we compare multiple state-of-the-art camera alignment methods by using a Motion Capture system. We provide a smartphone visual-inertial benchmark for portrait capturing, where we report results for multiple methods and motivate further use of the provided trajectories, available in the dataset, in view synthesis and 3D reconstruction tasks.

Related papers

Valeo Near-Field: a novel dataset for pedestrian intent detection [21.659078060884614]
This paper presents a novel dataset aimed at detecting pedestrians' intentions as they approach an ego-vehicle.<n>The dataset comprises synchronized multi-modal data, including fisheye camera feeds, lidar laser scans, ultrasonic sensor readings, and motion capture-based 3D body poses.
arXiv Detail & Related papers (2025-10-17T14:02:54Z)
Unleashing the Temporal Potential of Stereo Event Cameras for Continuous-Time 3D Object Detection [44.479946706395694]
Event cameras offer a solution by capturing motion continuously.<n>We propose a novel stereo 3D object detection framework that relies solely on event cameras.<n> Experiments show that our method outperforms prior approaches in dynamic environments.
arXiv Detail & Related papers (2025-08-04T10:57:03Z)
Monocular 3D Hand Pose Estimation with Implicit Camera Alignment [9.199465050084296]
Estimating the 3D hand articulation from a single color image is an important problem with applications in Augmented Reality (AR), Virtual Reality (VR), Human-Computer Interaction (HCI)<n>We propose an optimization pipeline for estimating the 3D hand articulation from 2D keypoint input, which includes a keypoint alignment step and a fingertip loss.<n>We evaluate our approach on the EgoDexter and Dexter+Object benchmarks to showcase that it performs competitively with the state-of-the-art.
arXiv Detail & Related papers (2025-06-10T18:45:22Z)
FRAME: Floor-aligned Representation for Avatar Motion from Egocentric Video [52.33896173943054]
Egocentric motion capture with a head-mounted body-facing stereo camera is crucial for VR and AR applications. Existing methods rely on synthetic pretraining and struggle to generate smooth and accurate predictions in real-world settings. We propose FRAME, a simple yet effective architecture that combines device pose and camera feeds for state-of-the-art body pose prediction.
arXiv Detail & Related papers (2025-03-29T14:26:06Z)
EventEgo3D++: 3D Human Motion Capture from a Head-Mounted Event Camera [64.58147600753382]
EventEgo3D++ is a monocular event camera with a fisheye lens for 3D human motion capture. Event cameras excel in high-speed scenarios and varying illumination due to their high temporal resolution. Our method supports real-time 3D pose updates at a rate of 140Hz.
arXiv Detail & Related papers (2025-02-11T18:57:05Z)
PIV3CAMS: a multi-camera dataset for multiple computer vision problems and its application to novel view-point synthesis [120.4361056355332]
This thesis introduces Paired Image and Video data from three CAMeraS, namely PIV3CAMS. The PIV3CAMS dataset consists of 8385 pairs of images and 82 pairs of videos taken from three different cameras. In addition to the regeneration of a current state-of-the-art algorithm, we investigate several proposed alternative models that integrate depth information geometrically.
arXiv Detail & Related papers (2024-07-26T12:18:29Z)
MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements [59.70107451308687]
We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior rendering by enabling faster scale awareness, and improved trajectory tracking. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit.
arXiv Detail & Related papers (2024-04-01T04:57:41Z)
Multi-Modal Dataset Acquisition for Photometrically Challenging Object [56.30027922063559]
This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects. We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
arXiv Detail & Related papers (2023-08-21T10:38:32Z)
Hand Gestures Recognition in Videos Taken with Lensless Camera [4.49422973940462]
This work proposes a deep learning model named Raw3dNet that recognizes hand gestures directly on raw videos captured by a lensless camera. In addition to conserving computational resources, the reconstruction-free method provides privacy protection.
arXiv Detail & Related papers (2022-10-15T08:52:49Z)
The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement [25.637162990928676]
We show how we can combine dense micro-baseline parallax cues with kilopixel LiDAR depth estimates during viewfinding. The proposed method brings high-resolution depth estimates to 'point-and-shoot' tabletop photography and requires no additional hardware, artificial hand motion, or user interaction beyond the press of a button.
arXiv Detail & Related papers (2021-11-26T20:24:07Z)
SPEC: Seeing People in the Wild with an Estimated Camera [64.85791231401684]
We introduce SPEC, the first in-the-wild 3D HPS method that estimates the perspective camera from a single image. We train a neural network to estimate the field of view, camera pitch, and roll an input image. We then train a novel network that rolls the camera calibration to the image features and uses these together to regress 3D body shape and pose.
arXiv Detail & Related papers (2021-10-01T19:05:18Z)
TUM-VIE: The TUM Stereo Visual-Inertial Event Dataset [50.8779574716494]
Event cameras are bio-inspired vision sensors which measure per pixel brightness changes. They offer numerous benefits over traditional, frame-based cameras, including low latency, high dynamic range, high temporal resolution and low power consumption. To foster the development of 3D perception and navigation algorithms with event cameras, we present the TUM-VIE dataset.
arXiv Detail & Related papers (2021-08-16T19:53:56Z)
Mesoscopic photogrammetry with an unstabilized phone camera [8.210210271599134]
We present a feature-free photogrammetric computation technique that enables quantitative 3D mesoscopic (mm-scale height variation) imaging. Our end-to-end, pixel-intensity-based approach jointly registers and stitches all the images by estimating a coaligned height map. We also propose strategies for reducing time and memory, applicable to other multi-frame registration problems.
arXiv Detail & Related papers (2020-12-11T00:09:18Z)
Event-based Stereo Visual Odometry [42.77238738150496]
We present a solution to the problem of visual odometry from the data acquired by a stereo event-based camera rig. We seek to maximize thetemporal consistency of stereo event-based data while using a simple and efficient representation.
arXiv Detail & Related papers (2020-07-30T15:53:28Z)
A Multi-spectral Dataset for Evaluating Motion Estimation Systems [7.953825491774407]
This paper presents a novel dataset for evaluating the performance of multi-spectral motion estimation systems. All the sequences are recorded from a handheld multi-spectral device. The depth images are captured by a Microsoft Kinect2 and can have benefits for learning cross-modalities stereo matching.
arXiv Detail & Related papers (2020-07-01T17:11:02Z)
Multi-View Photometric Stereo: A Robust Solution and Benchmark Dataset for Spatially Varying Isotropic Materials [65.95928593628128]
We present a method to capture both 3D shape and spatially varying reflectance with a multi-view photometric stereo technique. Our algorithm is suitable for perspective cameras and nearby point light sources.
arXiv Detail & Related papers (2020-01-18T12:26:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.