Related papers: A Versatile and Differentiable Hand-Object Interaction Representation

A Versatile and Differentiable Hand-Object Interaction Representation

URL: http://arxiv.org/abs/2409.16855v1
Date: Wed, 25 Sep 2024 12:06:30 GMT
Title: A Versatile and Differentiable Hand-Object Interaction Representation
Authors: Théo Morales, Omid Taheri, Gerard Lacey,
Abstract summary: Coarse Hand-Object Interaction Representation (CHOIR) is versatile and differentiable for HOI modelling. ChoIR represents dense contact maps with few parameters. JointDiffusion is a diffusion model to learn a grasp distribution conditioned on noisy hand-object interactions.
Score: 2.184775414778289
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Synthesizing accurate hands-object interactions (HOI) is critical for applications in Computer Vision, Augmented Reality (AR), and Mixed Reality (MR). Despite recent advances, the accuracy of reconstructed or generated HOI leaves room for refinement. Some techniques have improved the accuracy of dense correspondences by shifting focus from generating explicit contacts to using rich HOI fields. Still, they lack full differentiability or continuity and are tailored to specific tasks. In contrast, we present a Coarse Hand-Object Interaction Representation (CHOIR), a novel, versatile and fully differentiable field for HOI modelling. CHOIR leverages discrete unsigned distances for continuous shape and pose encoding, alongside multivariate Gaussian distributions to represent dense contact maps with few parameters. To demonstrate the versatility of CHOIR we design JointDiffusion, a diffusion model to learn a grasp distribution conditioned on noisy hand-object interactions or only object geometries, for both refinement and synthesis applications. We demonstrate JointDiffusion's improvements over the SOTA in both applications: it increases the contact F1 score by $5\%$ for refinement and decreases the sim. displacement by $46\%$ for synthesis. Our experiments show that JointDiffusion with CHOIR yield superior contact accuracy and physical realism compared to SOTA methods designed for specific tasks. Our models and code will be publicly available to the research community.

Related papers

Controlling Avatar Diffusion with Learnable Gaussian Embedding [27.651478116386354]
We introduce a novel control signal representation that is optimizable, dense, expressive, and 3D consistent. We synthesize a large-scale dataset with multiple poses and identities. Our model outperforms existing methods in terms of realism, expressiveness, and 3D consistency.
arXiv Detail & Related papers (2025-03-20T02:52:01Z)
Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation [52.36691633451968]
ViTaM-D is a visual-tactile framework for dynamic hand-object interaction reconstruction. DF-Field is a distributed force-aware contact representation model. Our results highlight the superior performance of ViTaM-D in both rigid and deformable object reconstruction.
arXiv Detail & Related papers (2024-11-14T16:29:45Z)
DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors [4.697267141773321]
We present DreamHOI, a novel method for zero-shot synthesis of human-object interactions (HOIs) We leverage text-to-image diffusion models trained on billions of image-caption pairs to generate realistic HOIs. We validate our approach through extensive experiments, demonstrating its effectiveness in generating realistic HOIs.
arXiv Detail & Related papers (2024-09-12T17:59:49Z)
Comprehensive Generative Replay for Task-Incremental Segmentation with Concurrent Appearance and Semantic Forgetting [49.87694319431288]
Generalist segmentation models are increasingly favored for diverse tasks involving various objects from different image sources. We propose a Comprehensive Generative (CGR) framework that restores appearance and semantic knowledge by synthesizing image-mask pairs. Experiments on incremental tasks (cardiac, fundus and prostate segmentation) show its clear advantage for alleviating concurrent appearance and semantic forgetting.
arXiv Detail & Related papers (2024-06-28T10:05:58Z)
GEARS: Local Geometry-aware Hand-object Interaction Synthesis [38.75942505771009]
We introduce a novel joint-centered sensor designed to reason about local object geometry near potential interaction regions. As an important step towards mitigating the learning complexity, we transform the points from global frame to template hand frame and use a shared module to process sensor features of each individual joint. This is followed by a perceptual-temporal transformer network aimed at capturing correlation among the joints in different dimensions.
arXiv Detail & Related papers (2024-04-02T09:18:52Z)
HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions [68.28684509445529]
We present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance. First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds. Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set.
arXiv Detail & Related papers (2024-03-27T13:56:08Z)
RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching) To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth. We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z)
Smooth, exact rotational symmetrization for deep learning on point clouds [0.0]
General-purpose point-cloud models are more varied but often disregard rotational symmetry. We propose a general symmetrization method that adds rotational equivariance to any given model while preserving all the other requirements. We demonstrate this idea by introducing the Point Edge Transformer (PET) architecture, which is not intrinsically equivariant but achieves state-of-the-art performance on several benchmark datasets of molecules and solids.
arXiv Detail & Related papers (2023-05-30T15:26:43Z)
Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance. We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring. Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z)
Affordance Diffusion: Synthesizing Hand-Object Interactions [81.98499943996394]
Given an RGB image of an object, we aim to hallucinate plausible images of a human hand interacting with it. We propose a two-step generative approach: a LayoutNet that samples an articulation-agnostic hand-object-interaction layout, and a ContentNet that synthesizes images of a hand grasping the object.
arXiv Detail & Related papers (2023-03-21T17:59:10Z)
A Shared Representation for Photorealistic Driving Simulators [83.5985178314263]
We propose to improve the quality of generated images by rethinking the discriminator architecture. The focus is on the class of problems where images are generated given semantic inputs, such as scene segmentation maps or human body poses. We aim to learn a shared latent representation that encodes enough information to jointly do semantic segmentation, content reconstruction, along with a coarse-to-fine grained adversarial reasoning.
arXiv Detail & Related papers (2021-12-09T18:59:21Z)
Mutual Graph Learning for Camouflaged Object Detection [31.422775969808434]
A major challenge is that intrinsic similarities between foreground objects and background surroundings make the features extracted by deep model indistinguishable. We design a novel Mutual Graph Learning model, which generalizes the idea of conventional mutual learning from regular grids to the graph domain. In contrast to most mutual learning approaches that use a shared function to model all between-task interactions, MGL is equipped with typed functions for handling different complementary relations.
arXiv Detail & Related papers (2021-04-03T10:14:39Z)
DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets. We propose an efficient and effective data augmentation method called DecAug for HOI detection. Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.