Creating a Large-scale Synthetic Dataset for Human Activity Recognition
- URL: http://arxiv.org/abs/2007.11118v1
- Date: Tue, 21 Jul 2020 22:20:21 GMT
- Title: Creating a Large-scale Synthetic Dataset for Human Activity Recognition
- Authors: Ollie Matthews, Koki Ryu, Tarun Srivastava
- Abstract summary: We use 3D rendering tools to generate a synthetic dataset of videos, and show that a classifier trained on these videos can generalise to real videos.
We fine tune a pre-trained I3D model on our videos, and find that the model is able to achieve a high accuracy of 73% on the HMDB51 dataset over three classes.
- Score: 0.8250374560598496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Creating and labelling datasets of videos for use in training Human Activity
Recognition models is an arduous task. In this paper, we approach this by using
3D rendering tools to generate a synthetic dataset of videos, and show that a
classifier trained on these videos can generalise to real videos. We use five
different augmentation techniques to generate the videos, leading to a wide
variety of accurately labelled unique videos. We fine tune a pre-trained I3D
model on our videos, and find that the model is able to achieve a high accuracy
of 73% on the HMDB51 dataset over three classes. We also find that augmenting
the HMDB training set with our dataset provides a 2% improvement in the
performance of the classifier. Finally, we discuss possible extensions to the
dataset, including virtual try on and modeling motion of the people.
Related papers
- VideoWorld: Exploring Knowledge Learning from Unlabeled Videos [119.35107657321902]
This work explores whether a deep generative model can learn complex knowledge solely from visual input.
We develop VideoWorld, an auto-regressive video generation model trained on unlabeled video data, and test its knowledge acquisition abilities in video-based Go and robotic control tasks.
arXiv Detail & Related papers (2025-01-16T18:59:10Z) - OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation [27.516068877910254]
We introduce OpenHumanVid, a large-scale and high-quality human-centric video dataset.
Our findings yield two critical insights: First, the incorporation of a large-scale, high-quality dataset substantially enhances evaluation metrics for generated human videos.
Second, the effective alignment of text with human appearance, human motion, and facial motion is essential for producing high-quality video outputs.
arXiv Detail & Related papers (2024-11-28T07:01:06Z) - 3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing [52.68314936128752]
We propose a new paradigm to automatically generate 3D labeled training data by harnessing the power of pretrained large foundation models.
For each target semantic class, we first generate 2D images of a single object in various structure and appearance via diffusion models and chatGPT generated text prompts.
We transform these augmented images into 3D objects and construct virtual scenes by random composition.
arXiv Detail & Related papers (2024-08-25T09:31:22Z) - Directed Domain Fine-Tuning: Tailoring Separate Modalities for Specific Training Tasks [0.0]
We propose to provide instructional datasets specific to the task of each modality within a distinct domain.
We use Video-LLaVA to generate recipes given cooking videos without transcripts.
Our approach to fine-tuning Video-LLaVA leads to gains over the baseline Video-LLaVA by 2% on the YouCook2 dataset.
arXiv Detail & Related papers (2024-06-24T06:39:02Z) - Advancing Human Action Recognition with Foundation Models trained on Unlabeled Public Videos [2.3247413495885647]
We use 283,582 unique, unlabeled TikTok video clips, categorized into 386 hashtags, to train a domain-specific foundation model for action recognition.
Our model achieves state-of-the-art results: 99.05% on UCF101, 86.08% on HMDB51, 85.51% on Kinetics-400, and 74.27% on Something-Something V2 using the ViT-giant backbone.
arXiv Detail & Related papers (2024-02-14T00:41:10Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking [57.552798046137646]
Video masked autoencoder (VideoMAE) is a scalable and general self-supervised pre-trainer for building video foundation models.
We successfully train a video ViT model with a billion parameters, which achieves a new state-of-the-art performance.
arXiv Detail & Related papers (2023-03-29T14:28:41Z) - InternVideo: General Video Foundation Models via Generative and
Discriminative Learning [52.69422763715118]
We present general video foundation models, InternVideo, for dynamic and complex video-level understanding tasks.
InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives.
InternVideo achieves state-of-the-art performance on 39 video datasets from extensive tasks including video action recognition/detection, video-language alignment, and open-world video applications.
arXiv Detail & Related papers (2022-12-06T18:09:49Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.