Refined Temporal Pyramidal Compression-and-Amplification Transformer for
3D Human Pose Estimation
- URL: http://arxiv.org/abs/2309.01365v3
- Date: Sun, 4 Feb 2024 07:17:28 GMT
- Title: Refined Temporal Pyramidal Compression-and-Amplification Transformer for
3D Human Pose Estimation
- Authors: Hanbing Liu, Wangmeng Xiang, Jun-Yan He, Zhi-Qi Cheng, Bin Luo, Yifeng
Geng and Xuansong Xie
- Abstract summary: Accurately estimating the 3D pose of humans in video sequences requires both accuracy and a well-structured architecture.
We introduce the Refined Temporal Pyramidal Compression-and-Amplification (RTPCA) transformer.
We demonstrate the effectiveness of RTPCA by achieving state-of-the-art results on Human3.6M, HumanEva-I, and MPI-INF-3DHP benchmarks.
- Score: 26.61672772233569
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurately estimating the 3D pose of humans in video sequences requires both
accuracy and a well-structured architecture. With the success of transformers,
we introduce the Refined Temporal Pyramidal Compression-and-Amplification
(RTPCA) transformer. Exploiting the temporal dimension, RTPCA extends
intra-block temporal modeling via its Temporal Pyramidal
Compression-and-Amplification (TPCA) structure and refines inter-block feature
interaction with a Cross-Layer Refinement (XLR) module. In particular, TPCA
block exploits a temporal pyramid paradigm, reinforcing key and value
representation capabilities and seamlessly extracting spatial semantics from
motion sequences. We stitch these TPCA blocks with XLR that promotes rich
semantic representation through continuous interaction of queries, keys, and
values. This strategy embodies early-stage information with current flows,
addressing typical deficits in detail and stability seen in other
transformer-based methods. We demonstrate the effectiveness of RTPCA by
achieving state-of-the-art results on Human3.6M, HumanEva-I, and MPI-INF-3DHP
benchmarks with minimal computational overhead. The source code is available at
https://github.com/hbing-l/RTPCA.
Related papers
- PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction.
We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences.
We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - RhythmFormer: Extracting rPPG Signals Based on Hierarchical Temporal
Periodic Transformer [17.751885452773983]
We propose a fully end-to-end transformer-based method for extracting r signals by explicitly leveraging the quasi-periodic nature of r periodicity.
A fusion stem is proposed to guide self-attention to r features effectively, and it can be easily transferred to existing methods to enhance their performance significantly.
arXiv Detail & Related papers (2024-02-20T07:56:02Z) - Transformer-based Video Saliency Prediction with High Temporal Dimension
Decoding [12.595019348741042]
We propose a transformer-based video saliency prediction approach with high temporal dimension network decoding (THTDNet)
This architecture yields comparable performance to multi-branch and over-complicated models on common benchmarks such as DHF1K, UCF-sports and Hollywood-2.
arXiv Detail & Related papers (2024-01-15T20:09:56Z) - Spatial-Temporal Transformer based Video Compression Framework [44.723459144708286]
We propose a novel Spatial-Temporal Transformer based Video Compression (STT-VC) framework.
It contains a Relaxed Deformable Transformer (RDT) with Uformer based offsets estimation for motion estimation and compensation, a Multi-Granularity Prediction (MGP) module based on multi-reference frames for prediction refinement, and a Spatial Feature Distribution prior based Transformer (SFD-T) for efficient temporal-spatial joint residual compression.
Experimental results demonstrate that our method achieves the best result with 13.5% BD-Rate saving over VTM.
arXiv Detail & Related papers (2023-09-21T09:23:13Z) - Exploring Frequency-Inspired Optimization in Transformer for Efficient Single Image Super-Resolution [32.29219284419944]
Cross-refinement adaptive feature modulation transformer (CRAFT)
We introduce a frequency-guided post-training quantization (PTQ) method aimed at enhancing CRAFT's efficiency.
Our experimental findings showcase CRAFT's superiority over current state-of-the-art methods, both in full-precision and quantization scenarios.
arXiv Detail & Related papers (2023-08-09T15:38:36Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Degradation-Aware Unfolding Half-Shuffle Transformer for Spectral
Compressive Imaging [142.11622043078867]
We propose a principled Degradation-Aware Unfolding Framework (DAUF) that estimates parameters from the compressed image and physical mask, and then uses these parameters to control each iteration.
By plugging HST into DAUF, we establish the first Transformer-based deep unfolding method, Degradation-Aware Unfolding Half-Shuffle Transformer (DAUHST) for HSI reconstruction.
arXiv Detail & Related papers (2022-05-20T11:37:44Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - TCCT: Tightly-Coupled Convolutional Transformer on Time Series
Forecasting [6.393659160890665]
We propose the concept of tightly-coupled convolutional Transformer(TCCT) and three TCCT architectures.
Our experiments on real-world datasets show that our TCCT architectures could greatly improve the performance of existing state-of-art Transformer models.
arXiv Detail & Related papers (2021-08-29T08:49:31Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.