Synthetic Data Are as Good as the Real for Association Knowledge
Learning in Multi-object Tracking
- URL: http://arxiv.org/abs/2106.16100v2
- Date: Fri, 2 Jul 2021 15:36:52 GMT
- Title: Synthetic Data Are as Good as the Real for Association Knowledge
Learning in Multi-object Tracking
- Authors: Yuchi Liu, Zhongdao Wang, Xiangxin Zhou and Liang Zheng
- Abstract summary: In this paper, we study whether 3D synthetic data can replace real-world videos for association training.
Specifically, we introduce a large-scale synthetic data engine named MOTX, where the motion characteristics of cameras and objects are manually configured to be similar to those in real-world datasets.
We show that compared with real data, association knowledge obtained from synthetic data can achieve very similar performance on real-world test sets without domain adaption techniques.
- Score: 19.772968520292345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Association, aiming to link bounding boxes of the same identity in a video
sequence, is a central component in multi-object tracking (MOT). To train
association modules, e.g., parametric networks, real video data are usually
used. However, annotating person tracks in consecutive video frames is
expensive, and such real data, due to its inflexibility, offer us limited
opportunities to evaluate the system performance w.r.t changing tracking
scenarios. In this paper, we study whether 3D synthetic data can replace
real-world videos for association training. Specifically, we introduce a
large-scale synthetic data engine named MOTX, where the motion characteristics
of cameras and objects are manually configured to be similar to those in
real-world datasets. We show that compared with real data, association
knowledge obtained from synthetic data can achieve very similar performance on
real-world test sets without domain adaption techniques. Our intriguing
observation is credited to two factors. First and foremost, 3D engines can well
simulate motion factors such as camera movement, camera view and object
movement, so that the simulated videos can provide association modules with
effective motion features. Second, experimental results show that the
appearance domain gap hardly harms the learning of association knowledge. In
addition, the strong customization ability of MOTX allows us to quantitatively
assess the impact of motion factors on MOT, which brings new insights to the
community.
Related papers
- VR-based generation of photorealistic synthetic data for training
hand-object tracking models [0.0]
"blender-hoisynth" is an interactive synthetic data generator based on the Blender software.
It is possible for users to interact with objects via virtual hands using standard Virtual Reality hardware.
We replace large parts of the training data in the well-known DexYCB dataset with hoisynth data and train a state-of-the-art HOI reconstruction model with it.
arXiv Detail & Related papers (2024-01-31T14:32:56Z) - TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models [75.20168902300166]
We propose TrackDiffusion, a novel video generation framework affording fine-grained trajectory-conditioned motion control.
A pivotal component of TrackDiffusion is the instance enhancer, which explicitly ensures inter-frame consistency of multiple objects.
generated video sequences by our TrackDiffusion can be used as training data for visual perception models.
arXiv Detail & Related papers (2023-12-01T15:24:38Z) - ROAM: Robust and Object-Aware Motion Generation Using Neural Pose
Descriptors [73.26004792375556]
This paper shows that robustness and generalisation to novel scene objects in 3D object-aware character synthesis can be achieved by training a motion model with as few as one reference object.
We leverage an implicit feature representation trained on object-only datasets, which encodes an SE(3)-equivariant descriptor field around the object.
We demonstrate substantial improvements in 3D virtual character motion and interaction quality and robustness to scenarios with unseen objects.
arXiv Detail & Related papers (2023-08-24T17:59:51Z) - Realistic Full-Body Tracking from Sparse Observations via Joint-Level
Modeling [13.284947022380404]
We propose a two-stage framework that can obtain accurate and smooth full-body motions with three tracking signals of head and hands only.
Our framework explicitly models the joint-level features in the first stage and utilizes them astemporal tokens for alternating spatial and temporal transformer blocks to capture joint-level correlations in the second stage.
With extensive experiments on the AMASS motion dataset and real-captured data, we show our proposed method can achieve more accurate and smooth motion compared to existing approaches.
arXiv Detail & Related papers (2023-08-17T08:27:55Z) - PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point
Tracking [90.29143475328506]
We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework.
Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion.
We animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos.
arXiv Detail & Related papers (2023-07-27T17:58:11Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z) - Hindsight for Foresight: Unsupervised Structured Dynamics Models from
Physical Interaction [24.72947291987545]
Key challenge for an agent learning to interact with the world is to reason about physical properties of objects.
We propose a novel approach for modeling the dynamics of a robot's interactions directly from unlabeled 3D point clouds and images.
arXiv Detail & Related papers (2020-08-02T11:04:49Z) - RELATE: Physically Plausible Multi-Object Scene Synthesis Using
Structured Latent Spaces [77.07767833443256]
We present RELATE, a model that learns to generate physically plausible scenes and videos of multiple interacting objects.
In contrast to state-of-the-art methods in object-centric generative modeling, RELATE also extends naturally to dynamic scenes and generates videos of high visual fidelity.
arXiv Detail & Related papers (2020-07-02T17:27:27Z) - Learning to simulate complex scenes [18.51564016785853]
This paper explores content adaptation in the context of semantic segmentation.
We propose a scalable discretization-and-relaxation (SDR) approach to optimize the attribute values and obtain a training set of similar content to real-world data.
Experiment shows our system can generate reasonable and useful scenes, from which we obtain promising real-world segmentation accuracy.
arXiv Detail & Related papers (2020-06-25T17:51:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.