Related papers: Pose Recognition with Cascade Transformers

Pose Recognition with Cascade Transformers

URL: http://arxiv.org/abs/2104.06976v1
Date: Wed, 14 Apr 2021 17:00:22 GMT
Title: Pose Recognition with Cascade Transformers
Authors: Ke Li, Shijie Wang, Xiang Zhang, Yifan Xu, Weijian Xu, Zhuowen Tu
Abstract summary: We present a regression-based pose recognition method using Transformers. Heatmap-based and regression-based methods achieve higher accuracy but are subject to various designs. In the experiments, we report competitive results for pose recognition when compared with the competing regression-based methods.
Score: 31.7059023190426
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we present a regression-based pose recognition method using cascade Transformers. One way to categorize the existing approaches in this domain is to separate them into 1). heatmap-based and 2). regression-based. In general, heatmap-based methods achieve higher accuracy but are subject to various heuristic designs (not end-to-end mostly), whereas regression-based approaches attain relatively lower accuracy but they have less intermediate non-differentiable steps. Here we utilize the encoder-decoder structure in Transformers to perform regression-based person and keypoint detection that is general-purpose and requires less heuristic design compared with the existing approaches. We demonstrate the keypoint hypothesis (query) refinement process across different self-attention layers to reveal the recursive self-attention mechanism in Transformers. In the experiments, we report competitive results for pose recognition when compared with the competing regression-based methods.

Related papers

Converting Transformers into DGNNs Form [3.7468283401703797]
We introduce a synthetic unitary digraph convolution based on the digraph Fourier transform. The resulting model, which we term Converter, effectively converts a Transformer into a Directed Graph Neural Network form. We have tested Converter on Long-Range Arena benchmark, long document classification, and DNA sequence-based taxonomy classification.
arXiv Detail & Related papers (2025-02-01T22:44:46Z)
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning? [69.4145579827826]
We show a fast flow on the regression loss despite the gradient non-ity algorithms for our convergence landscape. This is the first theoretical analysis for multi-layer Transformer in this setting.
arXiv Detail & Related papers (2024-10-10T18:29:05Z)
Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption. We analyze how magnitude-based models affect generalization while improving adaption. We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z)
How to get the most out of Twinned Regression Methods [0.0]
Twinned regression methods are designed to solve the dual problem to the original regression problem. A solution to the original regression problem can be obtained by ensembling predicted differences between the targets of an unknown data point and multiple known anchor data points.
arXiv Detail & Related papers (2023-01-03T22:37:44Z)
Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z)
Poseur: Direct Human Pose Regression with Transformers [119.79232258661995]
We propose a direct, regression-based approach to 2D human pose estimation from single images. Our framework is end-to-end differentiable, and naturally learns to exploit the dependencies between keypoints. Ours is the first regression-based approach to perform favorably compared to the best heatmap-based pose estimation methods.
arXiv Detail & Related papers (2022-01-19T04:31:57Z)
Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression [81.05772887221333]
We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework. We present a simple yet effective approach, named disentangled keypoint regression (DEKR) We empirically show that the proposed direct regression method outperforms keypoint detection and grouping methods.
arXiv Detail & Related papers (2021-04-06T05:54:46Z)
TFPose: Direct Human Pose Estimation with Transformers [83.03424247905869]
We formulate the pose estimation task into a sequence prediction problem that can effectively be solved by transformers. Our framework is simple and direct, bypassing the drawbacks of the heatmap-based pose estimation. Experiments on the MS-COCO and MPII datasets demonstrate that our method can significantly improve the state-of-the-art of regression-based pose estimation.
arXiv Detail & Related papers (2021-03-29T04:18:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.