An Efficient Framework for Few-shot Skeleton-based Temporal Action
Segmentation
- URL: http://arxiv.org/abs/2207.09925v1
- Date: Wed, 20 Jul 2022 14:08:37 GMT
- Title: An Efficient Framework for Few-shot Skeleton-based Temporal Action
Segmentation
- Authors: Leiyang Xu, Qiang Wang, Xiaotian Lin, Lin Yuan
- Abstract summary: Temporal action segmentation (TAS) aims to classify and locate actions in the long untrimmed action sequence.
This study proposes an efficient framework for the few-shot skeleton-based TAS, including a data augmentation method and an improved model.
- Score: 6.610414185789651
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal action segmentation (TAS) aims to classify and locate actions in the
long untrimmed action sequence. With the success of deep learning, many deep
models for action segmentation have emerged. However, few-shot TAS is still a
challenging problem. This study proposes an efficient framework for the
few-shot skeleton-based TAS, including a data augmentation method and an
improved model. The data augmentation approach based on motion interpolation is
presented here to solve the problem of insufficient data, and can increase the
number of samples significantly by synthesizing action sequences. Besides, we
concatenate a Connectionist Temporal Classification (CTC) layer with a network
designed for skeleton-based TAS to obtain an optimized model. Leveraging CTC
can enhance the temporal alignment between prediction and ground truth and
further improve the segment-wise metrics of segmentation results. Extensive
experiments on both public and self-constructed datasets, including two
small-scale datasets and one large-scale dataset, show the effectiveness of two
proposed methods in improving the performance of the few-shot skeleton-based
TAS task.
Related papers
- Faster Diffusion Action Segmentation [9.868244939496678]
Temporal Action Classification (TAS) is an essential task in video analysis, aiming to segment and classify continuous frames into distinct action segments.
Recent advances in diffusion models have demonstrated substantial success in TAS tasks due to their stable training process and high-quality generation capabilities.
We propose EffiDiffAct, an efficient and high-performance TAS algorithm.
arXiv Detail & Related papers (2024-08-04T13:23:18Z) - End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning [5.587301322663445]
We introduce an end-to-end streaming video temporal action segmentation model with reinforcement learning (SVTAS-RL)
The SVTAS-RL model significantly outperforms existing STAS models and achieves competitive performance to the state-of-the-art TAS model on multiple datasets under the same evaluation criteria.
arXiv Detail & Related papers (2023-09-27T14:30:34Z) - Body Segmentation Using Multi-task Learning [1.0832844764942349]
We present a novel multi-task model for human segmentation/parsing that involves three tasks.
The main idea behind the proposed--Pose--DensePose model (or SPD for short) is to learn a better segmentation model by sharing knowledge across different, yet related tasks.
The performance of the model is analysed through rigorous experiments on the LIP and ATR datasets and in comparison to a recent (state-of-the-art) multi-task body-segmentation model.
arXiv Detail & Related papers (2022-12-13T13:06:21Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of
Semantics and Depth [83.94528876742096]
We tackle the MTL problem of two dense tasks, ie, semantic segmentation and depth estimation, and present a novel attention module called Cross-Channel Attention Module (CCAM)
In a true symbiotic spirit, we then formulate a novel data augmentation for the semantic segmentation task using predicted depth called AffineMix, and a simple depth augmentation using predicted semantics called ColorAug.
Finally, we validate the performance gain of the proposed method on the Cityscapes dataset, which helps us achieve state-of-the-art results for a semi-supervised joint model based on depth and semantic
arXiv Detail & Related papers (2022-06-21T17:40:55Z) - Temporal Attention-Augmented Graph Convolutional Network for Efficient
Skeleton-Based Human Action Recognition [97.14064057840089]
Graphal networks (GCNs) have been very successful in modeling non-Euclidean data structures.
Most GCN-based action recognition methods use deep feed-forward networks with high computational complexity to process all skeletons in an action.
We propose a temporal attention module (TAM) for increasing the efficiency in skeleton-based action recognition.
arXiv Detail & Related papers (2020-10-23T08:01:55Z) - MS-TCN++: Multi-Stage Temporal Convolutional Network for Action
Segmentation [87.16030562892537]
We propose a multi-stage architecture for the temporal action segmentation task.
The first stage generates an initial prediction that is refined by the next ones.
Our models achieve state-of-the-art results on three datasets.
arXiv Detail & Related papers (2020-06-16T14:50:47Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.