Related papers: ChatPose: Chatting about 3D Human Pose

ChatPose: Chatting about 3D Human Pose

URL: http://arxiv.org/abs/2311.18836v2
Date: Tue, 23 Apr 2024 17:53:48 GMT
Title: ChatPose: Chatting about 3D Human Pose
Authors: Yao Feng, Jing Lin, Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Michael J. Black,
Abstract summary: ChatPose is a framework to understand and reason about 3D human poses from images or textual descriptions. Our work is motivated by the human ability to intuitively understand postures from a single image or a brief description.
Score: 47.70287492050979
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce ChatPose, a framework employing Large Language Models (LLMs) to understand and reason about 3D human poses from images or textual descriptions. Our work is motivated by the human ability to intuitively understand postures from a single image or a brief description, a process that intertwines image interpretation, world knowledge, and an understanding of body language. Traditional human pose estimation and generation methods often operate in isolation, lacking semantic understanding and reasoning abilities. ChatPose addresses these limitations by embedding SMPL poses as distinct signal tokens within a multimodal LLM, enabling the direct generation of 3D body poses from both textual and visual inputs. Leveraging the powerful capabilities of multimodal LLMs, ChatPose unifies classical 3D human pose and generation tasks while offering user interactions. Additionally, ChatPose empowers LLMs to apply their extensive world knowledge in reasoning about human poses, leading to two advanced tasks: speculative pose generation and reasoning about pose estimation. These tasks involve reasoning about humans to generate 3D poses from subtle text queries, possibly accompanied by images. We establish benchmarks for these tasks, moving beyond traditional 3D pose generation and estimation methods. Our results show that ChatPose outperforms existing multimodal LLMs and task-specific methods on these newly proposed tasks. Furthermore, ChatPose's ability to understand and generate 3D human poses based on complex reasoning opens new directions in human pose analysis.

Related papers

CoT-Pose: Chain-of-Thought Reasoning for 3D Pose Generation from Abstract Prompts [1.0742675209112622]
We introduce a novel framework that incorporates CoT reasoning into the pose generation process.<n>We propose a data synthesis pipeline that automatically generates triplets of abstract prompts, detailed prompts, and corresponding 3D poses.<n> Experimental results demonstrate that our reasoning-enhanced model, CoT-Pose, can effectively generate plausible and semantically aligned poses.
arXiv Detail & Related papers (2025-08-11T01:43:41Z)
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing [79.68232381605661]
We present UniPose, a framework to comprehend, generate, and edit human poses across various modalities. Specifically, we apply a pose tokenizer to convert 3D poses into discrete pose tokens, enabling seamless integration into the LLM within a unified vocabulary. Benefiting from a unified learning strategy, UniPose effectively transfers knowledge across different pose-relevant tasks, adapts to unseen tasks, and exhibits extended capabilities.
arXiv Detail & Related papers (2024-11-25T08:06:30Z)
AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation [60.5897687447003]
AvatarGO is a novel framework designed to generate realistic 4D HOI scenes from textual inputs. Our framework not only generates coherent compositional motions, but also exhibits greater robustness in handling issues. As the first attempt to synthesize 4D avatars with object interactions, we hope AvatarGO could open new doors for human-centric 4D content creation.
arXiv Detail & Related papers (2024-10-09T17:58:56Z)
PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation [38.958695275774616]
We introduce a new transformer-based model, trained in a retrieval fashion, which can take as input any combination of the aforementioned modalities. We showcase the potential of such an embroidered pose representation for (1) SMPL regression from image with optional text cue; and (2) on the task of fine-grained instruction generation.
arXiv Detail & Related papers (2024-09-10T14:09:39Z)
Diverse 3D Human Pose Generation in Scenes based on Decoupled Structure [2.9895817635228017]
We present a novel method for generating diverse 3D human poses in scenes with semantic control. Our approach consists of three stages: pose generation, contact generation, and putting human into the scene. The experimental results on the PROX dataset demonstrate that our method produces more physically plausible interactions.
arXiv Detail & Related papers (2024-06-09T08:33:10Z)
ChatHuman: Chatting about 3D Humans with Tools [57.29285473727107]
ChatHuman is a language-driven system that integrates the capabilities of specialized methods into a unified framework.<n>ChatHuman functions as an assistant proficient in utilizing, analyzing, and interacting with tools specific to 3D human tasks.
arXiv Detail & Related papers (2024-05-07T17:59:31Z)
Pose Priors from Language Models [74.61186408764559]
Language is often used to describe physical interaction, yet most 3D human pose estimation methods overlook this rich source of information.<n>We bridge this gap by leveraging large multimodal models (LMMs) as priors for reconstructing contact poses.
arXiv Detail & Related papers (2024-05-06T17:59:36Z)
MPM: A Unified 2D-3D Human Pose Representation via Masked Pose Modeling [59.74064212110042]
mpmcan handle multiple tasks including 3D human pose estimation, 3D pose estimation from cluded 2D pose, and 3D pose completion in a textocbfsingle framework. We conduct extensive experiments and ablation studies on several widely used human pose datasets and achieve state-of-the-art performance on MPI-INF-3DHP.
arXiv Detail & Related papers (2023-06-29T10:30:00Z)
PoseScript: Linking 3D Human Poses and Natural Language [38.85620213438554]
We introduce the PoseScript dataset, which pairs more than six thousand 3D human poses with rich human-annotated descriptions. To increase the size of the dataset to a scale that is compatible with data-hungry learning algorithms, we have proposed an elaborate captioning process. This process extracts low-level pose information, known as "posecodes", using a set of simple but generic rules on the 3D keypoints. With automatic annotations, the amount of available data significantly scales up (100k), making it possible to effectively pretrain deep models for finetuning on human captions.
arXiv Detail & Related papers (2022-10-21T08:18:49Z)
Neural Novel Actor: Learning a Generalized Animatable Neural Representation for Human Actors [98.24047528960406]
We propose a new method for learning a generalized animatable neural representation from a sparse set of multi-view imagery of multiple persons. The learned representation can be used to synthesize novel view images of an arbitrary person from a sparse set of cameras, and further animate them with the user's pose control.
arXiv Detail & Related papers (2022-08-25T07:36:46Z)
Unsupervised 3D Human Pose Representation with Viewpoint and Pose Disentanglement [63.853412753242615]
Learning a good 3D human pose representation is important for human pose related tasks. We propose a novel Siamese denoising autoencoder to learn a 3D pose representation. Our approach achieves state-of-the-art performance on two inherently different tasks.
arXiv Detail & Related papers (2020-07-14T14:25:22Z)
Adversarial Synthesis of Human Pose from Text [18.02001711736337]
This work focuses on synthesizing human poses from human-level text descriptions. We propose a model that is based on a conditional generative adversarial network. We show through qualitative and quantitative results that the model is capable of synthesizing plausible poses matching the given text.
arXiv Detail & Related papers (2020-05-01T12:32:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.