A Multi-Person Video Dataset Annotation Method of Spatio-Temporally
Actions
- URL: http://arxiv.org/abs/2204.10160v1
- Date: Thu, 21 Apr 2022 15:14:02 GMT
- Title: A Multi-Person Video Dataset Annotation Method of Spatio-Temporally
Actions
- Authors: Fan Yang
- Abstract summary: We use to crop videos and frame videos; then use yolov5 to detect human in the video frame, and then use deep sort to detect the ID of the human in the video frame.
- Score: 4.49302950538123
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Spatio-temporal action detection is an important and challenging problem in
video understanding. However, the application of the existing large-scale
spatio-temporal action datasets in specific fields is limited, and there is
currently no public tool for making spatio-temporal action datasets, it takes a
lot of time and effort for researchers to customize the spatio-temporal action
datasets, so we propose a multi-Person video dataset Annotation Method of
spatio-temporally actions.First, we use ffmpeg to crop the videos and frame the
videos; then use yolov5 to detect human in the video frame, and then use deep
sort to detect the ID of the human in the video frame. By processing the
detection results of yolov5 and deep sort, we can get the annotation file of
the spatio-temporal action dataset to complete the work of customizing the
spatio-temporal action dataset.
Related papers
- POPCat: Propagation of particles for complex annotation tasks [7.236620861573004]
We propose a time efficient method called POPCat that exploits the multi-target and temporal features of video data.
The method generates a semi-supervised pipeline for segmentation or box-based video annotation.
The method shows a margin of improvement on recall/mAP50/mAP over the best results.
arXiv Detail & Related papers (2024-06-24T23:43:08Z) - Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video
Grounding [59.599378814835205]
Temporal Video Grounding (TVG) aims to localize the temporal boundary of a specific segment in an untrimmed video based on a given language query.
We introduce a novel AMDA method to adaptively adjust the model's scene-related knowledge by incorporating insights from the target data.
arXiv Detail & Related papers (2023-12-21T07:49:27Z) - Boundary-Denoising for Video Activity Localization [57.9973253014712]
We study the video activity localization problem from a denoising perspective.
Specifically, we propose an encoder-decoder model named DenoiseLoc.
Experiments show that DenoiseLoc advances %in several video activity understanding tasks.
arXiv Detail & Related papers (2023-04-06T08:48:01Z) - Video Action Detection: Analysing Limitations and Challenges [70.01260415234127]
We analyze existing datasets on video action detection and discuss their limitations.
We perform a biasness study which analyzes a key property differentiating videos from static images: the temporal aspect.
Such extreme experiments show existence of biases which have managed to creep into existing methods inspite of careful modeling.
arXiv Detail & Related papers (2022-04-17T00:42:14Z) - FineAction: A Fined Video Dataset for Temporal Action Localization [60.90129329728657]
FineAction is a new large-scale fined video dataset collected from existing video datasets and web videos.
This dataset contains 139K fined action instances densely annotated in almost 17K untrimmed videos spanning 106 action categories.
Experimental results reveal that our FineAction brings new challenges for action localization on fined and multi-label instances with shorter duration.
arXiv Detail & Related papers (2021-05-24T06:06:32Z) - Activity Graph Transformer for Temporal Action Localization [41.69734359113706]
We introduce Activity Graph Transformer, an end-to-end learnable model for temporal action localization.
In this work, we capture this non-linear temporal structure by reasoning over the videos as non-sequential entities in the form of graphs.
Our results show that our proposed model outperforms the state-of-the-art by a considerable margin.
arXiv Detail & Related papers (2021-01-21T10:42:48Z) - Spatio-Temporal Action Detection with Multi-Object Interaction [127.85524354900494]
In this paper, we study the S-temporal action detection problem with multi-object interaction.
We introduce a new dataset that is spatially annotated with action tubes containing multi-object interactions.
We propose an end-to-endtemporal action detection model that performs both spatial and temporal regression simultaneously.
arXiv Detail & Related papers (2020-04-01T00:54:56Z) - STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos [17.232631075144592]
Methods for instance segmentation in videos typically follow the tracking-by-detection paradigm.
We propose a novel approach that segments and tracks instances across space and time in a single stage.
Our method achieves state-of-the-art results across multiple datasets and tasks.
arXiv Detail & Related papers (2020-03-18T18:40:52Z) - ZSTAD: Zero-Shot Temporal Activity Detection [107.63759089583382]
We propose a novel task setting called zero-shot temporal activity detection (ZSTAD), where activities that have never been seen in training can still be detected.
We design an end-to-end deep network based on R-C3D as the architecture for this solution.
Experiments on both the THUMOS14 and the Charades datasets show promising performance in terms of detecting unseen activities.
arXiv Detail & Related papers (2020-03-12T02:40:36Z) - Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in
Untrimmed Sequences [25.299599341774204]
This paper proposes an approach for the unsupervised learning of actions in untrimmed video sequences based on a joint visual-temporal embedding space.
We show that the proposed approach is able to provide a meaningful visual and temporal embedding out of the visual cues present in contiguous video frames.
arXiv Detail & Related papers (2020-01-29T22:51:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.