Related papers: Dynamic Hyperbolic Attention Network for Fine Hand-object Reconstruction

Dynamic Hyperbolic Attention Network for Fine Hand-object Reconstruction

URL: http://arxiv.org/abs/2309.02965v1
Date: Wed, 6 Sep 2023 13:00:10 GMT
Title: Dynamic Hyperbolic Attention Network for Fine Hand-object Reconstruction
Authors: Zhiying Leng, Shun-Cheng Wu, Mahdi Saleh, Antonio Montanaro, Hao Yu, Yin Wang, Nassir Navab, Xiaohui Liang, Federico Tombari
Abstract summary: We propose the first precise hand-object reconstruction method in hyperbolic space, namely Dynamic Hyperbolic Attention Network (DHANet) Our method learns mesh features with rich geometry-image multi-modal information and models better hand-object interaction.
Score: 76.5549647815413
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reconstructing both objects and hands in 3D from a single RGB image is complex. Existing methods rely on manually defined hand-object constraints in Euclidean space, leading to suboptimal feature learning. Compared with Euclidean space, hyperbolic space better preserves the geometric properties of meshes thanks to its exponentially-growing space distance, which amplifies the differences between the features based on similarity. In this work, we propose the first precise hand-object reconstruction method in hyperbolic space, namely Dynamic Hyperbolic Attention Network (DHANet), which leverages intrinsic properties of hyperbolic space to learn representative features. Our method that projects mesh and image features into a unified hyperbolic space includes two modules, ie. dynamic hyperbolic graph convolution and image-attention hyperbolic graph convolution. With these two modules, our method learns mesh features with rich geometry-image multi-modal information and models better hand-object interaction. Our method provides a promising alternative for fine hand-object reconstruction in hyperbolic space. Extensive experiments on three public datasets demonstrate that our method outperforms most state-of-the-art methods.

Related papers

Object-X: Learning to Reconstruct Multi-Modal 3D Object Representations [112.29763628638112]
Object-X is a versatile multi-modal 3D representation framework.<n>It can encoding rich object embeddings and decoding them back into geometric and visual reconstructions.<n>It supports a range of downstream tasks, including scene alignment, single-image 3D object reconstruction, and localization.
arXiv Detail & Related papers (2025-06-05T09:14:42Z)
Machine Unlearning in Hyperbolic vs. Euclidean Multimodal Contrastive Learning: Adapting Alignment Calibration to MERU [50.9588132578029]
This paper investigates machine unlearning in hyperbolic contrastive learning. We adapt Alignment to MERU, a model that embeds images and text in hyperbolic space to better capture semantic hierarchies. Our approach introduces hyperbolic-specific components including entailment calibration and norm regularization that leverage the unique properties of hyperbolic space.
arXiv Detail & Related papers (2025-03-19T12:47:37Z)
A Novel Convolution and Attention Mechanism-based Model for 6D Object Pose Estimation [49.1574468325115]
Esting 6D object poses from RGB images is challenging because the lack of depth information requires inferring a three dimensional structure from 2D projections. Traditional methods often rely on deep learning with grid based data structures but struggle to capture complex dependencies among extracted features. We introduce a graph based representation derived directly from images, where temporal features of each pixel serve as nodes, and relationships between them are defined through node connectivity and spatial interactions.
arXiv Detail & Related papers (2024-12-31T18:47:54Z)
Geometry Distributions [51.4061133324376]
We propose a novel geometric data representation that models geometry as distributions. Our approach uses diffusion models with a novel network architecture to learn surface point distributions. We evaluate our representation qualitatively and quantitatively across various object types, demonstrating its effectiveness in achieving high geometric fidelity.
arXiv Detail & Related papers (2024-11-25T04:06:48Z)
Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning. voxelization infers per-object occupancy probabilities at individual spatial locations. Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z)
Ghost on the Shell: An Expressive Representation of General 3D Shapes [97.76840585617907]
Meshes are appealing since they enable fast physics-based rendering with realistic material and lighting. Recent work on reconstructing and statistically modeling 3D shapes has critiqued meshes as being topologically inflexible. We parameterize open surfaces by defining a manifold signed distance field on watertight surfaces. G-Shell achieves state-of-the-art performance on non-watertight mesh reconstruction and generation tasks.
arXiv Detail & Related papers (2023-10-23T17:59:52Z)
HMSN: Hyperbolic Self-Supervised Learning by Clustering with Ideal Prototypes [7.665392786787577]
We use hyperbolic representation space for self-supervised representation learning for prototype-based clustering approaches. We extend the Masked Siamese Networks to operate on the Poincar'e ball model of hyperbolic space. Unlike previous methods we project to the hyperbolic space at the output of the encoder network and utilise a hyperbolic projection head to ensure that the representations used for downstream tasks remain hyperbolic.
arXiv Detail & Related papers (2023-05-18T12:38:40Z)
Learning Pose Image Manifolds Using Geometry-Preserving GANs and Elasticae [13.202747831999414]
Geometric Style-GAN (Geom-SGAN) maps images to low-dimensional latent representations. Euler's elastica smoothly interpolate between directed points (points + tangent directions) in the low-dimensional latent space.
arXiv Detail & Related papers (2023-05-17T18:45:56Z)
Decoupled Iterative Refinement Framework for Interacting Hands Reconstruction from a Single RGB Image [30.24438569170251]
We propose a decoupled iterative refinement framework to achieve pixel-alignment hand reconstruction. Our method outperforms all existing two-hand reconstruction methods by a large margin on the InterHand2.6M dataset.
arXiv Detail & Related papers (2023-02-05T15:46:57Z)
HRCF: Enhancing Collaborative Filtering via Hyperbolic Geometric Regularization [52.369435664689995]
We introduce a textitHyperbolic Regularization powered Collaborative Filtering (HRCF) and design a geometric-aware hyperbolic regularizer. Specifically, the proposal boosts optimization procedure via the root alignment and origin-aware penalty. Our proposal is able to tackle the over-smoothing problem caused by hyperbolic aggregation and also brings the models a better discriminative ability.
arXiv Detail & Related papers (2022-04-18T06:11:44Z)
Enhancing Hyperbolic Graph Embeddings via Contrastive Learning [7.901082408569372]
We propose a novel Hyperbolic Graph Contrastive Learning (HGCL) framework which learns node representations through multiple hyperbolic spaces. Experimental results on multiple real-world datasets demonstrate the superiority of the proposed HGCL.
arXiv Detail & Related papers (2022-01-21T06:10:05Z)
Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D. By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.