IntegralAction: Pose-driven Feature Integration for Robust Human Action
Recognition in Videos
- URL: http://arxiv.org/abs/2007.06317v2
- Date: Thu, 15 Apr 2021 07:29:32 GMT
- Title: IntegralAction: Pose-driven Feature Integration for Robust Human Action
Recognition in Videos
- Authors: Gyeongsik Moon, Heeseung Kwon, Kyoung Mu Lee, Minsu Cho
- Abstract summary: We learn pose-driven feature integration that dynamically combines appearance and pose streams by observing pose features on the fly.
We show that the proposed IntegralAction achieves highly robust performance across in-context and out-of-context action video datasets.
- Score: 94.06960017351574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most current action recognition methods heavily rely on appearance
information by taking an RGB sequence of entire image regions as input. While
being effective in exploiting contextual information around humans, e.g., human
appearance and scene category, they are easily fooled by out-of-context action
videos where the contexts do not exactly match with target actions. In
contrast, pose-based methods, which take a sequence of human skeletons only as
input, suffer from inaccurate pose estimation or ambiguity of human pose per
se. Integrating these two approaches has turned out to be non-trivial; training
a model with both appearance and pose ends up with a strong bias towards
appearance and does not generalize well to unseen videos. To address this
problem, we propose to learn pose-driven feature integration that dynamically
combines appearance and pose streams by observing pose features on the fly. The
main idea is to let the pose stream decide how much and which appearance
information is used in integration based on whether the given pose information
is reliable or not. We show that the proposed IntegralAction achieves highly
robust performance across in-context and out-of-context action video datasets.
The codes are available in https://github.com/mks0601/IntegralAction_RELEASE.
Related papers
- VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation [79.99551055245071]
We propose VividPose, an end-to-end pipeline that ensures superior temporal stability.
An identity-aware appearance controller integrates additional facial information without compromising other appearance details.
A geometry-aware pose controller utilizes both dense rendering maps from SMPL-X and sparse skeleton maps.
VividPose exhibits superior generalization capabilities on our proposed in-the-wild dataset.
arXiv Detail & Related papers (2024-05-28T13:18:32Z) - Seeing the Pose in the Pixels: Learning Pose-Aware Representations in
Vision Transformers [1.8047694351309207]
We introduce two strategies for learning pose-aware representations in Vision Transformer (ViT)
The first method, called Pose-aware Attention Block (PAAB), is a plug-and-play ViT block that performs localized attention on pose regions within videos.
The second method, dubbed Pose-Aware Auxiliary Task (PAAT), presents an auxiliary pose prediction task optimized jointly with the primary ViT task.
arXiv Detail & Related papers (2023-06-15T17:58:39Z) - PoseVocab: Learning Joint-structured Pose Embeddings for Human Avatar
Modeling [30.93155530590843]
We present PoseVocab, a novel pose encoding method that can encode high-fidelity human details.
Given multi-view RGB videos of a character, PoseVocab constructs key poses and latent embeddings based on the training poses.
Experiments show that our method outperforms other state-of-the-art baselines.
arXiv Detail & Related papers (2023-04-25T17:25:36Z) - Hallucinating Pose-Compatible Scenes [55.064949607528405]
We present a large-scale generative adversarial network for pose-conditioned scene generation.
We curating a massive meta-dataset containing over 19 million frames of humans in everyday environments.
We leverage our trained model for various applications: hallucinating pose-compatible scene(s) with or without humans, visualizing incompatible scenes and poses, placing a person from one generated image into another scene, and animating pose.
arXiv Detail & Related papers (2021-12-13T18:59:26Z) - Image Comes Dancing with Collaborative Parsing-Flow Video Synthesis [124.48519390371636]
Transfering human motion from a source to a target person poses great potential in computer vision and graphics applications.
Previous work has either relied on crafted 3D human models or trained a separate model specifically for each target person.
This work studies a more general setting, in which we aim to learn a single model to parsimoniously transfer motion from a source video to any target person.
arXiv Detail & Related papers (2021-10-27T03:42:41Z) - Video Pose Distillation for Few-Shot, Fine-Grained Sports Action
Recognition [17.84533144792773]
Video Pose Distillation (VPD) is a weakly-supervised technique to learn features for new video domains.
VPD features improve performance on few-shot, fine-grained action recognition, retrieval, and detection tasks in four real-world sports video datasets.
arXiv Detail & Related papers (2021-09-03T04:36:12Z) - Learning Dynamics via Graph Neural Networks for Human Pose Estimation
and Tracking [98.91894395941766]
We propose a novel online approach to learning the pose dynamics, which are independent of pose detections in current fame.
Specifically, we derive this prediction of dynamics through a graph neural network(GNN) that explicitly accounts for both spatial-temporal and visual information.
Experiments on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed method achieves results superior to the state of the art on both human pose estimation and tracking tasks.
arXiv Detail & Related papers (2021-06-07T16:36:50Z) - Unsupervised 3D Human Pose Representation with Viewpoint and Pose
Disentanglement [63.853412753242615]
Learning a good 3D human pose representation is important for human pose related tasks.
We propose a novel Siamese denoising autoencoder to learn a 3D pose representation.
Our approach achieves state-of-the-art performance on two inherently different tasks.
arXiv Detail & Related papers (2020-07-14T14:25:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.