FixMyPose: Pose Correctional Captioning and Retrieval
- URL: http://arxiv.org/abs/2104.01703v1
- Date: Sun, 4 Apr 2021 21:45:44 GMT
- Title: FixMyPose: Pose Correctional Captioning and Retrieval
- Authors: Hyounghun Kim, Abhay Zala, Graham Burri, Mohit Bansal
- Abstract summary: We introduce a new captioning dataset named FixMyPose to address automated pose correction systems.
We collect descriptions of correcting a "current" pose to look like a "target" pose.
To avoid ML biases, we maintain a balance across characters with diverse demographics.
- Score: 67.20888060019028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Interest in physical therapy and individual exercises such as yoga/dance has
increased alongside the well-being trend. However, such exercises are hard to
follow without expert guidance (which is impossible to scale for personalized
feedback to every trainee remotely). Thus, automated pose correction systems
are required more than ever, and we introduce a new captioning dataset named
FixMyPose to address this need. We collect descriptions of correcting a
"current" pose to look like a "target" pose (in both English and Hindi). The
collected descriptions have interesting linguistic properties such as
egocentric relations to environment objects, analogous references, etc.,
requiring an understanding of spatial relations and commonsense knowledge about
postures. Further, to avoid ML biases, we maintain a balance across characters
with diverse demographics, who perform a variety of movements in several
interior environments (e.g., homes, offices). From our dataset, we introduce
the pose-correctional-captioning task and its reverse target-pose-retrieval
task. During the correctional-captioning task, models must generate
descriptions of how to move from the current to target pose image, whereas in
the retrieval task, models should select the correct target pose given the
initial pose and correctional description. We present strong cross-attention
baseline models (uni/multimodal, RL, multilingual) and also show that our
baselines are competitive with other models when evaluated on other
image-difference datasets. We also propose new task-specific metrics
(object-match, body-part-match, direction-match) and conduct human evaluation
for more reliable evaluation, and we demonstrate a large human-model
performance gap suggesting room for promising future work. To verify the
sim-to-real transfer of our FixMyPose dataset, we collect a set of real images
and show promising performance on these images.
Related papers
- Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation [32.190055780969466]
Stable-Pose is a novel adapter model that introduces a coarse-to-fine attention masking strategy into a vision Transformer.
We leverage the query-key self-attention mechanism of ViTs to explore the interconnections among different anatomical parts in human pose skeletons.
Stable-Pose achieved an AP score of 57.1 in the LAION-Human dataset, marking around 13% improvement over the established technique ControlNet.
arXiv Detail & Related papers (2024-06-04T16:54:28Z) - A Spatial-Temporal Transformer based Framework For Human Pose Assessment
And Correction in Education Scenarios [6.146739983645156]
The framework comprises skeletal tracking, pose estimation, posture assessment, and posture correction modules.
We create a pose correction method to provide corrective feedback in the form of visual aids.
Results show that our model can effectively measure and comment on the quality of students' actions.
arXiv Detail & Related papers (2023-11-01T09:53:38Z) - PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for
Human Pose Estimation [40.50255017107963]
We propose Pose Transformation (PoseTrans) to create new training samples that have diverse poses.
We also propose Pose Clustering Module (PCM) to measure the pose rarity and select the "rarest" poses to help balance the long-tailed distribution.
Our method is efficient and simple to implement, which can be easily integrated into the training pipeline of existing pose estimation models.
arXiv Detail & Related papers (2022-08-16T14:03:01Z) - Hallucinating Pose-Compatible Scenes [55.064949607528405]
We present a large-scale generative adversarial network for pose-conditioned scene generation.
We curating a massive meta-dataset containing over 19 million frames of humans in everyday environments.
We leverage our trained model for various applications: hallucinating pose-compatible scene(s) with or without humans, visualizing incompatible scenes and poses, placing a person from one generated image into another scene, and animating pose.
arXiv Detail & Related papers (2021-12-13T18:59:26Z) - Progressive and Aligned Pose Attention Transfer for Person Image
Generation [59.87492938953545]
This paper proposes a new generative adversarial network for pose transfer, i.e., transferring the pose of a given person to a target pose.
We use two types of blocks, namely Pose-Attentional Transfer Block (PATB) and Aligned Pose-Attentional Transfer Bloc (APATB)
We verify the efficacy of the model on the Market-1501 and DeepFashion datasets, using quantitative and qualitative measures.
arXiv Detail & Related papers (2021-03-22T07:24:57Z) - Adversarial Transfer of Pose Estimation Regression [11.117357750374035]
We develop a deep adaptation network for learning scene-invariant image representations and use adversarial learning to generate representations for model transfer.
We evaluate our network on two public datasets, Cambridge Landmarks and 7Scene, demonstrate its superiority over several baselines and compare to the state of the art methods.
arXiv Detail & Related papers (2020-06-20T21:16:37Z) - Yoga-82: A New Dataset for Fine-grained Classification of Human Poses [46.319423568714505]
We present a dataset, Yoga-82, for large-scale yoga pose recognition with 82 classes.
Yoga-82 consists of complex poses where fine annotations may not be possible.
The dataset contains a three-level hierarchy including body positions, variations in body positions, and the actual pose names.
arXiv Detail & Related papers (2020-04-22T01:43:44Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - Deformation-aware Unpaired Image Translation for Pose Estimation on
Laboratory Animals [56.65062746564091]
We aim to capture the pose of neuroscience model organisms, without using any manual supervision, to study how neural circuits orchestrate behaviour.
Our key contribution is the explicit and independent modeling of appearance, shape and poses in an unpaired image translation framework.
We demonstrate improved pose estimation accuracy on Drosophila melanogaster (fruit fly), Caenorhabditis elegans (worm) and Danio rerio (zebrafish)
arXiv Detail & Related papers (2020-01-23T15:34:11Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.