HandyPriors: Physically Consistent Perception of Hand-Object
Interactions with Differentiable Priors
- URL: http://arxiv.org/abs/2311.16552v3
- Date: Tue, 26 Dec 2023 06:25:20 GMT
- Title: HandyPriors: Physically Consistent Perception of Hand-Object
Interactions with Differentiable Priors
- Authors: Shutong Zhang, Yi-Ling Qiao, Guanglei Zhu, Eric Heiden, Dylan Turpin,
Jingzhou Liu, Ming Lin, Miles Macklin, Animesh Garg
- Abstract summary: We propose HandyPriors, a unified and general pipeline for pose estimation in human-object interaction scenes.
Our approach employs rendering priors to align with input images and segmentation masks along with physics priors to mitigate penetration and relative-sliding across frames.
We show that HandyPriors attains comparable or superior results in the pose estimation task, and that the differentiable physics module can predict contact information for pose refinement.
- Score: 45.75318635639419
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Various heuristic objectives for modeling hand-object interaction have been
proposed in past work. However, due to the lack of a cohesive framework, these
objectives often possess a narrow scope of applicability and are limited by
their efficiency or accuracy. In this paper, we propose HandyPriors, a unified
and general pipeline for pose estimation in human-object interaction scenes by
leveraging recent advances in differentiable physics and rendering. Our
approach employs rendering priors to align with input images and segmentation
masks along with physics priors to mitigate penetration and relative-sliding
across frames. Furthermore, we present two alternatives for hand and object
pose estimation. The optimization-based pose estimation achieves higher
accuracy, while the filtering-based tracking, which utilizes the differentiable
priors as dynamics and observation models, executes faster. We demonstrate that
HandyPriors attains comparable or superior results in the pose estimation task,
and that the differentiable physics module can predict contact information for
pose refinement. We also show that our approach generalizes to perception
tasks, including robotic hand manipulation and human-object pose estimation in
the wild.
Related papers
- VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques.
We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step.
Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z) - Object-centric Video Representation for Long-term Action Anticipation [33.115854386196126]
Key motivation is that objects provide important cues to recognize and predict human-object interactions.
We propose to build object-centric video representations by leveraging visual-language pretrained models.
To recognize and predict human-object interactions, we use a Transformer-based neural architecture.
arXiv Detail & Related papers (2023-10-31T22:54:31Z) - InterTracker: Discovering and Tracking General Objects Interacting with
Hands in the Wild [40.489171608114574]
Existing methods rely on frame-based detectors to locate interacting objects.
We propose to leverage hand-object interaction to track interactive objects.
Our proposed method outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-06T09:09:17Z) - Mutual Information-Based Temporal Difference Learning for Human Pose
Estimation in Video [16.32910684198013]
We present a novel multi-frame human pose estimation framework, which employs temporal differences across frames to model dynamic contexts.
To be specific, we design a multi-stage entangled learning sequences conditioned on multi-stage differences to derive informative motion representation sequences.
These place us to rank No.1 in the Crowd Pose Estimation in Complex Events Challenge on benchmark HiEve.
arXiv Detail & Related papers (2023-03-15T09:29:03Z) - Interacting Hand-Object Pose Estimation via Dense Mutual Attention [97.26400229871888]
3D hand-object pose estimation is the key to the success of many computer vision applications.
We propose a novel dense mutual attention mechanism that is able to model fine-grained dependencies between the hand and the object.
Our method is able to produce physically plausible poses with high quality and real-time inference speed.
arXiv Detail & Related papers (2022-11-16T10:01:33Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - RAIN: Reinforced Hybrid Attention Inference Network for Motion
Forecasting [34.54878390622877]
We propose a generic motion forecasting framework with dynamic key information selection and ranking based on a hybrid attention mechanism.
The framework is instantiated to handle multi-agent trajectory prediction and human motion forecasting tasks.
We validate the framework on both synthetic simulations and motion forecasting benchmarks in different domains.
arXiv Detail & Related papers (2021-08-03T06:30:30Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Learning Long-term Visual Dynamics with Region Proposal Interaction
Networks [75.06423516419862]
We build object representations that can capture inter-object and object-environment interactions over a long-range.
Thanks to the simple yet effective object representation, our approach outperforms prior methods by a significant margin.
arXiv Detail & Related papers (2020-08-05T17:48:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.