S$^2$Contact: Graph-based Network for 3D Hand-Object Contact Estimation
with Semi-Supervised Learning
- URL: http://arxiv.org/abs/2208.00874v2
- Date: Thu, 3 Aug 2023 07:47:45 GMT
- Title: S$^2$Contact: Graph-based Network for 3D Hand-Object Contact Estimation
with Semi-Supervised Learning
- Authors: Tze Ho Elden Tse, Zhongqun Zhang, Kwang In Kim, Ales Leonardis, Feng
Zheng, Hyung Jin Chang
- Abstract summary: We propose a novel semi-supervised framework that allows us to learn contact from monocular images.
Specifically, we leverage visual and geometric consistency constraints in large-scale datasets for generating pseudo-labels.
We show benefits from using a contact map that rules hand-object interactions to produce more accurate reconstructions.
- Score: 70.72037296392642
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the recent efforts in accurate 3D annotations in hand and object
datasets, there still exist gaps in 3D hand and object reconstructions.
Existing works leverage contact maps to refine inaccurate hand-object pose
estimations and generate grasps given object models. However, they require
explicit 3D supervision which is seldom available and therefore, are limited to
constrained settings, e.g., where thermal cameras observe residual heat left on
manipulated objects. In this paper, we propose a novel semi-supervised
framework that allows us to learn contact from monocular images. Specifically,
we leverage visual and geometric consistency constraints in large-scale
datasets for generating pseudo-labels in semi-supervised learning and propose
an efficient graph-based network to infer contact. Our semi-supervised learning
framework achieves a favourable improvement over the existing supervised
learning methods trained on data with `limited' annotations. Notably, our
proposed model is able to achieve superior results with less than half the
network parameters and memory access cost when compared with the commonly-used
PointNet-based approach. We show benefits from using a contact map that rules
hand-object interactions to produce more accurate reconstructions. We further
demonstrate that training with pseudo-labels can extend contact map estimations
to out-of-domain objects and generalise better across multiple datasets.
Related papers
- Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Neural Semantic Surface Maps [52.61017226479506]
We present an automated technique for computing a map between two genus-zero shapes, which matches semantically corresponding regions to one another.
Our approach can generate semantic surface-to-surface maps, eliminating manual annotations or any 3D training data requirement.
arXiv Detail & Related papers (2023-09-09T16:21:56Z) - Learning Explicit Contact for Implicit Reconstruction of Hand-held
Objects from Monocular Images [59.49985837246644]
We show how to model contacts in an explicit way to benefit the implicit reconstruction of hand-held objects.
In the first part, we propose a new subtask of directly estimating 3D hand-object contacts from a single image.
In the second part, we introduce a novel method to diffuse estimated contact states from the hand mesh surface to nearby 3D space.
arXiv Detail & Related papers (2023-05-31T17:59:26Z) - Object-level 3D Semantic Mapping using a Network of Smart Edge Sensors [25.393382192511716]
We extend a multi-view 3D semantic mapping system consisting of a network of distributed edge sensors with object-level information.
Our method is evaluated on the public Behave dataset where it shows pose estimation within a few centimeters and in real-world experiments with the sensor network in a challenging lab environment.
arXiv Detail & Related papers (2022-11-21T11:13:08Z) - Self-supervised Learning of 3D Object Understanding by Data Association
and Landmark Estimation for Image Sequence [15.815583594196488]
3D object under-standing from 2D image is a challenging task that infers ad-ditional dimension from reduced-dimensional information.
It is challenging to obtain large amount of 3D dataset since achieving 3D annotation is expensive andtime-consuming.
We propose a strategy to exploit multipleobservations of the object in the image sequence in orderto surpass the self-performance.
arXiv Detail & Related papers (2021-04-14T18:59:08Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Monocular 3D Detection with Geometric Constraints Embedding and
Semi-supervised Training [3.8073142980733]
We propose a novel framework for monocular 3D objects detection using only RGB images, called KM3D-Net.
We design a fully convolutional model to predict object keypoints, dimension, and orientation, and then combine these estimations with perspective geometry constraints to compute position attribute.
arXiv Detail & Related papers (2020-09-02T00:51:51Z) - Leveraging Photometric Consistency over Time for Sparsely Supervised
Hand-Object Reconstruction [118.21363599332493]
We present a method to leverage photometric consistency across time when annotations are only available for a sparse subset of frames in a video.
Our model is trained end-to-end on color images to jointly reconstruct hands and objects in 3D by inferring their poses.
We achieve state-of-the-art results on 3D hand-object reconstruction benchmarks and demonstrate that our approach allows us to improve the pose estimation accuracy.
arXiv Detail & Related papers (2020-04-28T12:03:14Z) - Joint Spatial-Temporal Optimization for Stereo 3D Object Tracking [34.40019455462043]
We propose a joint spatial-temporal optimization-based stereo 3D object tracking method.
From the network, we detect corresponding 2D bounding boxes on adjacent images and regress an initial 3D bounding box.
Dense object cues that associating to the object centroid are then predicted using a region-based network.
arXiv Detail & Related papers (2020-04-20T13:59:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.