Creating a Large-scale Synthetic Dataset for Human Activity Recognition
- URL: http://arxiv.org/abs/2007.11118v1
- Date: Tue, 21 Jul 2020 22:20:21 GMT
- Title: Creating a Large-scale Synthetic Dataset for Human Activity Recognition
- Authors: Ollie Matthews, Koki Ryu, Tarun Srivastava
- Abstract summary: We use 3D rendering tools to generate a synthetic dataset of videos, and show that a classifier trained on these videos can generalise to real videos.
We fine tune a pre-trained I3D model on our videos, and find that the model is able to achieve a high accuracy of 73% on the HMDB51 dataset over three classes.
- Score: 0.8250374560598496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Creating and labelling datasets of videos for use in training Human Activity
Recognition models is an arduous task. In this paper, we approach this by using
3D rendering tools to generate a synthetic dataset of videos, and show that a
classifier trained on these videos can generalise to real videos. We use five
different augmentation techniques to generate the videos, leading to a wide
variety of accurately labelled unique videos. We fine tune a pre-trained I3D
model on our videos, and find that the model is able to achieve a high accuracy
of 73% on the HMDB51 dataset over three classes. We also find that augmenting
the HMDB training set with our dataset provides a 2% improvement in the
performance of the classifier. Finally, we discuss possible extensions to the
dataset, including virtual try on and modeling motion of the people.
Related papers
- 3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing [52.68314936128752]
We propose a new paradigm to automatically generate 3D labeled training data by harnessing the power of pretrained large foundation models.
For each target semantic class, we first generate 2D images of a single object in various structure and appearance via diffusion models and chatGPT generated text prompts.
We transform these augmented images into 3D objects and construct virtual scenes by random composition.
arXiv Detail & Related papers (2024-08-25T09:31:22Z) - Directed Domain Fine-Tuning: Tailoring Separate Modalities for Specific Training Tasks [0.0]
We propose to provide instructional datasets specific to the task of each modality within a distinct domain.
We use Video-LLaVA to generate recipes given cooking videos without transcripts.
Our approach to fine-tuning Video-LLaVA leads to gains over the baseline Video-LLaVA by 2% on the YouCook2 dataset.
arXiv Detail & Related papers (2024-06-24T06:39:02Z) - Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features [21.583246378475856]
We introduce an extensive video dataset designed specifically for AI-Generated Video Detection (GenVidDet)
We also present the Dual-Branch 3D Transformer (DuB3D), an innovative and effective method for distinguishing between real and generated videos.
DuB3D can distinguish between real and generated video content with 96.77% accuracy, and strong generalization capability even for unseen types.
arXiv Detail & Related papers (2024-05-24T08:26:04Z) - Advancing Human Action Recognition with Foundation Models trained on Unlabeled Public Videos [2.3247413495885647]
We use 283,582 unique, unlabeled TikTok video clips, categorized into 386 hashtags, to train a domain-specific foundation model for action recognition.
Our model achieves state-of-the-art results: 99.05% on UCF101, 86.08% on HMDB51, 85.51% on Kinetics-400, and 74.27% on Something-Something V2 using the ViT-giant backbone.
arXiv Detail & Related papers (2024-02-14T00:41:10Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking [57.552798046137646]
Video masked autoencoder (VideoMAE) is a scalable and general self-supervised pre-trainer for building video foundation models.
We successfully train a video ViT model with a billion parameters, which achieves a new state-of-the-art performance.
arXiv Detail & Related papers (2023-03-29T14:28:41Z) - InternVideo: General Video Foundation Models via Generative and
Discriminative Learning [52.69422763715118]
We present general video foundation models, InternVideo, for dynamic and complex video-level understanding tasks.
InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives.
InternVideo achieves state-of-the-art performance on 39 video datasets from extensive tasks including video action recognition/detection, video-language alignment, and open-world video applications.
arXiv Detail & Related papers (2022-12-06T18:09:49Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - ViViT: A Video Vision Transformer [75.74690759089529]
We present pure-transformer based models for video classification.
Our model extracts-temporal tokens from the input video, which are then encoded by a series of transformer layers.
We show how we can effectively regularise the model during training and leverage pretrained image models to be able to train on comparatively small datasets.
arXiv Detail & Related papers (2021-03-29T15:27:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.