3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications
- URL: http://arxiv.org/abs/2410.10782v2
- Date: Wed, 12 Mar 2025 01:15:52 GMT
- Title: 3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications
- Authors: Eduardo R. Corral-Soto, Yang Liu, Tongtong Cao, Yuan Ren, Liu Bingbing,
- Abstract summary: This paper proposes a framework to generate synthetic dynamic 3D cyclist data assets that can be used to generate training data for different tasks.<n>We build a complete synthetic dynamic 3D cyclist (rider pedaling a bicycle) by re-posing a selectable synthetic 3D person.<n>We present both, qualitative and quantitative results where we compare our generated cyclists against those from a recent diffusion-based method.
- Score: 10.047701675476986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Autonomous Driving (AD) Perception, cyclists are considered safety-critical scene objects. Commonly used publicly-available AD datasets typically contain large amounts of car and vehicle object instances but a low number of cyclist instances, usually with limited appearance and pose diversity. This cyclist training data scarcity problem not only limits the generalization of deep-learning perception models for cyclist semantic segmentation, pose estimation, and cyclist crossing intention prediction, but also limits research on new cyclist-related tasks such as fine-grained cyclist pose estimation and spatio-temporal analysis under complex interactions between humans and articulated objects. To address this data scarcity problem, in this paper we propose a framework to generate synthetic dynamic 3D cyclist data assets that can be used to generate training data for different tasks. In our framework, we designed a methodology for creating a new part-based multi-view articulated synthetic 3D bicycle dataset that we call 3DArticBikes that we use to train a 3D Gaussian Splatting (3DGS)-based reconstruction and image rendering method. We then propose a parametric bicycle 3DGS composition model to assemble 8-DoF pose-controllable 3D bicycles. Finally, using dynamic information from cyclist videos, we build a complete synthetic dynamic 3D cyclist (rider pedaling a bicycle) by re-posing a selectable synthetic 3D person, while automatically placing the rider onto one of our new articulated 3D bicycles using a proposed 3D Keypoint optimization-based Inverse Kinematics pose refinement. We present both, qualitative and quantitative results where we compare our generated cyclists against those from a recent stable diffusion-based method.
Related papers
- Drive-1-to-3: Enriching Diffusion Priors for Novel View Synthesis of Real Vehicles [81.29018359825872]
This paper consolidates a set of good practices to finetune large pretrained models for a real-world task.
Specifically, we develop several strategies to account for discrepancies between the synthetic data and real driving data.
Our insights lead to effective finetuning that results in a $68.8%$ reduction in FID for novel view synthesis over prior arts.
arXiv Detail & Related papers (2024-12-19T03:39:13Z) - SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs [34.41011015930057]
SyntheOcc addresses the challenge of how to efficiently encode 3D geometric information as conditional input to a 2D diffusion model.
Our approach innovatively incorporates 3D semantic multi-plane images (MPIs) to provide comprehensive and spatially aligned 3D scene descriptions.
arXiv Detail & Related papers (2024-10-01T02:29:24Z) - CycleCrash: A Dataset of Bicycle Collision Videos for Collision Prediction and Analysis [21.584020544141797]
CycleCrash is a novel dataset consisting of 3,000 dashcam videos with 436,347 frames that capture cyclists in a range of critical situations.
This dataset enables 9 different cyclist collision prediction and classification tasks focusing on potentially hazardous conditions for cyclists.
We propose VidNeXt, a novel method that leverages a ConvNeXt spatial encoder and a non-stationary transformer to capture the temporal dynamics of videos for the tasks defined in our dataset.
arXiv Detail & Related papers (2024-09-30T04:46:35Z) - EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement [20.520938266527438]
We present CORE4D, a novel large-scale 4D human-object collaborative object rearrangement.
With 1K human-object-human motion sequences captured in the real world, we enrich CORE4D by contributing an iterative collaboration strategy to augment motions to a variety of novel objects.
Benefiting from extensive motion patterns provided by CORE4D, we benchmark two tasks aiming at generating human-object interaction: human-object motion forecasting and interaction synthesis.
arXiv Detail & Related papers (2024-06-27T17:32:18Z) - ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative
Modeling of Human-Object Interactions [11.32229757116179]
We introduce the ParaHome system, designed to capture dynamic 3D movements of humans and objects within a common home environment.
By leveraging the ParaHome system, we collect a novel large-scale dataset of human-object interaction.
arXiv Detail & Related papers (2024-01-18T18:59:58Z) - Pedestrian Environment Model for Automated Driving [54.16257759472116]
We propose an environment model that includes the position of the pedestrians as well as their pose information.
We extract the skeletal information with a neural network human pose estimator from the image.
To obtain the 3D information of the position, we aggregate the data from consecutive frames in conjunction with the vehicle position.
arXiv Detail & Related papers (2023-08-17T16:10:58Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - Bent & Broken Bicycles: Leveraging synthetic data for damaged object
re-identification [59.80753896200009]
We propose a novel task of damaged object re-identification, which aims at distinguishing changes in visual appearance due to deformations or missing parts from subtle intra-class variations.
We leverage the power of computer-generated imagery to create, in a semi-automatic fashion, high-quality synthetic images of the same bike before and after a damage occurs.
As a baseline for this task, we propose TransReI3D, a multi-task, transformer-based deep network unifying damage detection.
arXiv Detail & Related papers (2023-04-16T20:23:58Z) - 3D Data Augmentation for Driving Scenes on Camera [50.41413053812315]
We propose a 3D data augmentation approach termed Drive-3DAug, aiming at augmenting the driving scenes on camera in the 3D space.
We first utilize Neural Radiance Field (NeRF) to reconstruct the 3D models of background and foreground objects.
Then, augmented driving scenes can be obtained by placing the 3D objects with adapted location and orientation at the pre-defined valid region of backgrounds.
arXiv Detail & Related papers (2023-03-18T05:51:05Z) - HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for
Autonomous Driving [95.42203932627102]
3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians.
Our method efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin.
Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages.
arXiv Detail & Related papers (2022-12-15T11:15:14Z) - 3D Segmentation of Humans in Point Clouds with Synthetic Data [21.518379214837278]
We propose the task of joint 3D human semantic segmentation, instance segmentation and multi-human body-part segmentation.
We propose a framework for generating training data of synthetic humans interacting with real 3D scenes.
We also propose a novel transformer-based model, Human3D, which is the first end-to-end model for segmenting multiple human instances and their body-parts.
arXiv Detail & Related papers (2022-12-01T18:59:21Z) - Estimating 3D Motion and Forces of Human-Object Interactions from
Internet Videos [49.52070710518688]
We introduce a method to reconstruct the 3D motion of a person interacting with an object from a single RGB video.
Our method estimates the 3D poses of the person together with the object pose, the contact positions and the contact forces on the human body.
arXiv Detail & Related papers (2021-11-02T13:40:18Z) - D3D-HOI: Dynamic 3D Human-Object Interactions from Videos [49.38319295373466]
We introduce D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions.
Our dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints.
We leverage the estimated 3D human pose for more accurate inference of the object spatial layout and dynamics.
arXiv Detail & Related papers (2021-08-19T00:49:01Z) - Cyclist Trajectory Forecasts by Incorporation of Multi-View Video
Information [2.984037222955095]
This article presents a novel approach to incorporate visual cues from video-data from a wide-angle stereo camera system mounted at an urban intersection into the forecast of cyclist trajectories.
We extract features from image and optical flow sequences using 3D convolutional neural networks (3D-ConvNet) and combine them with features extracted from the cyclist's past trajectory to forecast future cyclist positions.
arXiv Detail & Related papers (2021-06-30T11:34:43Z) - AcinoSet: A 3D Pose Estimation Dataset and Baseline Models for Cheetahs
in the Wild [51.35013619649463]
We present an extensive dataset of free-running cheetahs in the wild, called AcinoSet.
The dataset contains 119,490 frames of multi-view synchronized high-speed video footage, camera calibration files and 7,588 human-annotated frames.
The resulting 3D trajectories, human-checked 3D ground truth, and an interactive tool to inspect the data is also provided.
arXiv Detail & Related papers (2021-03-24T15:54:11Z) - Chained Representation Cycling: Learning to Estimate 3D Human Pose and
Shape by Cycling Between Representations [73.11883464562895]
We propose a new architecture that facilitates unsupervised, or lightly supervised, learning.
We demonstrate the method by learning 3D human pose and shape from un-paired and un-annotated images.
While we present results for modeling humans, our formulation is general and can be applied to other vision problems.
arXiv Detail & Related papers (2020-01-06T14:54:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.