Test-Time Personalization with a Transformer for Human Pose Estimation
- URL: http://arxiv.org/abs/2107.02133v1
- Date: Mon, 5 Jul 2021 16:48:34 GMT
- Title: Test-Time Personalization with a Transformer for Human Pose Estimation
- Authors: Miao Hao, Yizhuo Li, Zonglin Di, Nitesh B. Gundavarapu, Xiaolong Wang
- Abstract summary: We adapt our pose estimator during test time to exploit person-specific information.
We show significant improvements on pose estimations with our self-supervised personalization.
- Score: 10.776892578762721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose to personalize a human pose estimator given a set of test images
of a person without using any manual annotations. While there is a significant
advancement in human pose estimation, it is still very challenging for a model
to generalize to different unknown environments and unseen persons. Instead of
using a fixed model for every test case, we adapt our pose estimator during
test time to exploit person-specific information. We first train our model on
diverse data with both a supervised and a self-supervised pose estimation
objectives jointly. We use a Transformer model to build a transformation
between the self-supervised keypoints and the supervised keypoints. During test
time, we personalize and adapt our model by fine-tuning with the
self-supervised objective. The pose is then improved by transforming the
updated self-supervised keypoints. We experiment with multiple datasets and
show significant improvements on pose estimations with our self-supervised
personalization.
Related papers
- Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape.
We collect 35K trials of behavioral data from over 500 participants.
We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z) - PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions [57.871692507044344]
Pose estimation aims to accurately identify anatomical keypoints in humans and animals using monocular images.
Current models are typically trained and tested on clean data, potentially overlooking the corruption during real-world deployment.
We introduce PoseBench, a benchmark designed to evaluate the robustness of pose estimation models against real-world corruption.
arXiv Detail & Related papers (2024-06-20T14:40:17Z) - Personalized Pose Forecasting [28.46838162184121]
We reformulate the human motion forecasting problem and present a model-agnostic personalization method.
Motion forecasting personalization can be performed efficiently online by utilizing a low-parametric time-series analysis model.
arXiv Detail & Related papers (2023-12-06T14:43:38Z) - YOLOPose V2: Understanding and Improving Transformer-based 6D Pose
Estimation [36.067414358144816]
YOLOPose is a Transformer-based multi-object 6D pose estimation method.
We employ a learnable orientation estimation module to predict the orientation from the keypoints.
Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2023-07-21T12:53:54Z) - Meta-Auxiliary Learning for Adaptive Human Pose Prediction [26.877194503491072]
Predicting high-fidelity future human poses is decisive for intelligent robots to interact with humans.
Deep end-to-end learning approaches, which typically train a generic pre-trained model on external datasets and then directly apply it to all test samples, remain non-optimal.
We propose a novel test-time adaptation framework that leverages two self-supervised auxiliary tasks to help the primary forecasting network adapt to the test sequence.
arXiv Detail & Related papers (2023-04-13T11:17:09Z) - YOLOPose: Transformer-based Multi-Object 6D Pose Estimation using
Keypoint Regression [44.282841879849244]
We propose YOLOPose, a Transformer-based multi-object 6D pose estimation method based on keypoint regression.
Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2022-05-05T09:51:39Z) - ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation [76.35955924137986]
We show that a plain vision transformer with MAE pretraining can obtain superior performance after finetuning on human pose estimation datasets.
Our biggest ViTPose model based on the ViTAE-G backbone with 1 billion parameters obtains the best 80.9 mAP on the MS COCO test-dev set.
arXiv Detail & Related papers (2022-04-26T17:55:04Z) - FixMyPose: Pose Correctional Captioning and Retrieval [67.20888060019028]
We introduce a new captioning dataset named FixMyPose to address automated pose correction systems.
We collect descriptions of correcting a "current" pose to look like a "target" pose.
To avoid ML biases, we maintain a balance across characters with diverse demographics.
arXiv Detail & Related papers (2021-04-04T21:45:44Z) - End-to-End Trainable Multi-Instance Pose Estimation with Transformers [68.93512627479197]
We propose a new end-to-end trainable approach for multi-instance pose estimation by combining a convolutional neural network with a transformer.
Inspired by recent work on end-to-end trainable object detection with transformers, we use a transformer encoder-decoder architecture together with a bipartite matching scheme to directly regress the pose of all individuals in a given image.
Our model, called POse Estimation Transformer (POET), is trained using a novel set-based global loss that consists of a keypoint loss, a keypoint visibility loss, a center loss and a class loss.
arXiv Detail & Related papers (2021-03-22T18:19:22Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.