PoseRAC: Pose Saliency Transformer for Repetitive Action Counting
- URL: http://arxiv.org/abs/2303.08450v2
- Date: Thu, 16 Mar 2023 01:33:08 GMT
- Title: PoseRAC: Pose Saliency Transformer for Repetitive Action Counting
- Authors: Ziyu Yao, Xuxin Cheng, Yuexian Zou
- Abstract summary: We introduce Pose Saliency Representation, which efficiently represents each action using only two salient poses instead of redundant frames.
We also introduce PoseRAC, which is based on this representation and achieves state-of-the-art performance.
Our lightweight model is highly efficient, requiring only 20 minutes for training on a GPU, and infers nearly 10x faster compared to previous methods.
- Score: 56.34379680390869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a significant contribution to the field of repetitive
action counting through the introduction of a new approach called Pose Saliency
Representation. The proposed method efficiently represents each action using
only two salient poses instead of redundant frames, which significantly reduces
the computational cost while improving the performance. Moreover, we introduce
a pose-level method, PoseRAC, which is based on this representation and
achieves state-of-the-art performance on two new version datasets by using Pose
Saliency Annotation to annotate salient poses for training. Our lightweight
model is highly efficient, requiring only 20 minutes for training on a GPU, and
infers nearly 10x faster compared to previous methods. In addition, our
approach achieves a substantial improvement over the previous state-of-the-art
TransRAC, achieving an OBO metric of 0.56 compared to 0.29 of TransRAC. The
code and new dataset are available at https://github.com/MiracleDance/PoseRAC
for further research and experimentation, making our proposed approach highly
accessible to the research community.
Related papers
- Transforming Image Super-Resolution: A ConvFormer-based Efficient Approach [58.57026686186709]
We introduce the Convolutional Transformer layer (ConvFormer) and propose a ConvFormer-based Super-Resolution network (CFSR)
CFSR inherits the advantages of both convolution-based and transformer-based approaches.
Experiments demonstrate that CFSR strikes an optimal balance between computational cost and performance.
arXiv Detail & Related papers (2024-01-11T03:08:00Z) - End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames [55.72994484532856]
temporal action detection (TAD) has seen significant performance improvement with end-to-end training.
Due to the memory bottleneck, only models with limited scales and limited data volumes can afford end-to-end training.
We reduce the memory consumption for end-to-end training, and manage to scale up the TAD backbone to 1 billion parameters and the input video to 1,536 frames.
arXiv Detail & Related papers (2023-11-28T21:31:04Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - ParaFormer: Parallel Attention Transformer for Efficient Feature
Matching [8.552303361149612]
This paper proposes a novel parallel attention model entitled ParaFormer.
It fuses features and keypoint positions through the concept of amplitude and phase, and integrates self- and cross-attention in a parallel manner.
Experiments on various applications, including homography estimation, pose estimation, and image matching, demonstrate that ParaFormer achieves state-of-the-art performance.
The efficient ParaFormer-U variant achieves comparable performance with less than 50% FLOPs of the existing attention-based models.
arXiv Detail & Related papers (2023-03-02T03:29:16Z) - Deep Active Ensemble Sampling For Image Classification [8.31483061185317]
Active learning frameworks aim to reduce the cost of data annotation by actively requesting the labeling for the most informative data points.
Some proposed approaches include uncertainty-based techniques, geometric methods, implicit combination of uncertainty-based and geometric approaches.
We present an innovative integration of recent progress in both uncertainty-based and geometric frameworks to enable an efficient exploration/exploitation trade-off in sample selection strategy.
Our framework provides two advantages: (1) accurate posterior estimation, and (2) tune-able trade-off between computational overhead and higher accuracy.
arXiv Detail & Related papers (2022-10-11T20:20:20Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Skeleton Split Strategies for Spatial Temporal Graph Convolution
Networks [2.132096006921048]
A skeleton representation of the human body has been proven to be effective for this task.
A new set of methods to perform the convolution operation upon the skeleton graph is presented.
arXiv Detail & Related papers (2021-08-03T05:57:52Z) - SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up
Human Pose Estimation [81.03485688525133]
We propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE)
Specifically, in the training process, we enable SIMPLE to mimic the pose knowledge from the high-performance top-down pipeline.
Besides, SIMPLE formulates human detection and pose estimation as a unified point learning framework to complement each other in single-network.
arXiv Detail & Related papers (2021-04-06T13:12:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.