Related papers: HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors

HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors

URL: http://arxiv.org/abs/2311.16552v3
Date: Tue, 26 Dec 2023 06:25:20 GMT
Title: HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors
Authors: Shutong Zhang, Yi-Ling Qiao, Guanglei Zhu, Eric Heiden, Dylan Turpin, Jingzhou Liu, Ming Lin, Miles Macklin, Animesh Garg
Abstract summary: We propose HandyPriors, a unified and general pipeline for pose estimation in human-object interaction scenes. Our approach employs rendering priors to align with input images and segmentation masks along with physics priors to mitigate penetration and relative-sliding across frames. We show that HandyPriors attains comparable or superior results in the pose estimation task, and that the differentiable physics module can predict contact information for pose refinement.
Score: 45.75318635639419
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Various heuristic objectives for modeling hand-object interaction have been proposed in past work. However, due to the lack of a cohesive framework, these objectives often possess a narrow scope of applicability and are limited by their efficiency or accuracy. In this paper, we propose HandyPriors, a unified and general pipeline for pose estimation in human-object interaction scenes by leveraging recent advances in differentiable physics and rendering. Our approach employs rendering priors to align with input images and segmentation masks along with physics priors to mitigate penetration and relative-sliding across frames. Furthermore, we present two alternatives for hand and object pose estimation. The optimization-based pose estimation achieves higher accuracy, while the filtering-based tracking, which utilizes the differentiable priors as dynamics and observation models, executes faster. We demonstrate that HandyPriors attains comparable or superior results in the pose estimation task, and that the differentiable physics module can predict contact information for pose refinement. We also show that our approach generalizes to perception tasks, including robotic hand manipulation and human-object pose estimation in the wild.

Related papers

Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control [72.00655365269]
We present RoboMaster, a novel framework that models inter-object dynamics through a collaborative trajectory formulation.<n>Unlike prior methods that decompose objects, our core is to decompose the interaction process into three sub-stages: pre-interaction, interaction, and post-interaction.<n>Our method outperforms existing approaches, establishing new state-of-the-art performance in trajectory-controlled video generation for robotic manipulation.
arXiv Detail & Related papers (2025-06-02T17:57:06Z)
SIGHT: Single-Image Conditioned Generation of Hand Trajectories for Hand-Object Interaction [86.54738165527502]
We introduce a novel task of generating realistic and diverse 3D hand trajectories given a single image of an object. Hand-object interaction trajectory priors can greatly benefit applications in robotics, embodied AI, augmented reality and related fields.
arXiv Detail & Related papers (2025-03-28T20:53:20Z)
VICAN: Very Efficient Calibration Algorithm for Large Camera Networks [49.17165360280794]
We introduce a novel methodology that extends Pose Graph Optimization techniques. We consider the bipartite graph encompassing cameras, object poses evolving dynamically, and camera-object relative transformations at each time step. Our framework retains compatibility with traditional PGO solvers, but its efficacy benefits from a custom-tailored optimization scheme.
arXiv Detail & Related papers (2024-03-25T17:47:03Z)
Object-centric Video Representation for Long-term Action Anticipation [33.115854386196126]
Key motivation is that objects provide important cues to recognize and predict human-object interactions. We propose to build object-centric video representations by leveraging visual-language pretrained models. To recognize and predict human-object interactions, we use a Transformer-based neural architecture.
arXiv Detail & Related papers (2023-10-31T22:54:31Z)
InterTracker: Discovering and Tracking General Objects Interacting with Hands in the Wild [40.489171608114574]
Existing methods rely on frame-based detectors to locate interacting objects. We propose to leverage hand-object interaction to track interactive objects. Our proposed method outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2023-08-06T09:09:17Z)
Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video [16.32910684198013]
We present a novel multi-frame human pose estimation framework, which employs temporal differences across frames to model dynamic contexts. To be specific, we design a multi-stage entangled learning sequences conditioned on multi-stage differences to derive informative motion representation sequences. These place us to rank No.1 in the Crowd Pose Estimation in Complex Events Challenge on benchmark HiEve.
arXiv Detail & Related papers (2023-03-15T09:29:03Z)
Interacting Hand-Object Pose Estimation via Dense Mutual Attention [97.26400229871888]
3D hand-object pose estimation is the key to the success of many computer vision applications. We propose a novel dense mutual attention mechanism that is able to model fine-grained dependencies between the hand and the object. Our method is able to produce physically plausible poses with high quality and real-time inference speed.
arXiv Detail & Related papers (2022-11-16T10:01:33Z)
Learn to Predict How Humans Manipulate Large-sized Objects from Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task. We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z)
RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting [34.54878390622877]
We propose a generic motion forecasting framework with dynamic key information selection and ranking based on a hybrid attention mechanism. The framework is instantiated to handle multi-agent trajectory prediction and human motion forecasting tasks. We validate the framework on both synthetic simulations and motion forecasting benchmarks in different domains.
arXiv Detail & Related papers (2021-08-03T06:30:30Z)
TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks. To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z)
Learning Long-term Visual Dynamics with Region Proposal Interaction Networks [75.06423516419862]
We build object representations that can capture inter-object and object-environment interactions over a long-range. Thanks to the simple yet effective object representation, our approach outperforms prior methods by a significant margin.
arXiv Detail & Related papers (2020-08-05T17:48:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.