ViTaSCOPE: Visuo-tactile Implicit Representation for In-hand Pose and Extrinsic Contact Estimation
- URL: http://arxiv.org/abs/2506.12239v1
- Date: Fri, 13 Jun 2025 21:35:58 GMT
- Title: ViTaSCOPE: Visuo-tactile Implicit Representation for In-hand Pose and Extrinsic Contact Estimation
- Authors: Jayjun Lee, Nima Fazeli,
- Abstract summary: dexterous contact-rich object manipulation demands both object poses and external contact locations.<n>We present ViTaSCOPE: VisuoTac Simultaneous Contact and Object Estimation.
- Score: 2.140861702387444
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mastering dexterous, contact-rich object manipulation demands precise estimation of both in-hand object poses and external contact locations$\unicode{x2013}$tasks particularly challenging due to partial and noisy observations. We present ViTaSCOPE: Visuo-Tactile Simultaneous Contact and Object Pose Estimation, an object-centric neural implicit representation that fuses vision and high-resolution tactile feedback. By representing objects as signed distance fields and distributed tactile feedback as neural shear fields, ViTaSCOPE accurately localizes objects and registers extrinsic contacts onto their 3D geometry as contact fields. Our method enables seamless reasoning over complementary visuo-tactile cues by leveraging simulation for scalable training and zero-shot transfers to the real-world by bridging the sim-to-real gap. We evaluate our method through comprehensive simulated and real-world experiments, demonstrating its capabilities in dexterous manipulation scenarios.
Related papers
- SIGHT: Synthesizing Image-Text Conditioned and Geometry-Guided 3D Hand-Object Trajectories [124.24041272390954]
Modeling hand-object interaction priors holds significant potential to advance robotic and embodied AI systems.<n>We introduce SIGHT, a novel task focused on generating realistic and physically plausible 3D hand-object interaction trajectories from a single image.<n>We propose SIGHT-Fusion, a novel diffusion-based image-text conditioned generative model that tackles this task by retrieving the most similar 3D object mesh from a database.
arXiv Detail & Related papers (2025-03-28T20:53:20Z) - Neural feels with neural fields: Visuo-tactile perception for in-hand
manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation.
Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem.
Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z) - Learning Explicit Contact for Implicit Reconstruction of Hand-held
Objects from Monocular Images [59.49985837246644]
We show how to model contacts in an explicit way to benefit the implicit reconstruction of hand-held objects.
In the first part, we propose a new subtask of directly estimating 3D hand-object contacts from a single image.
In the second part, we introduce a novel method to diffuse estimated contact states from the hand mesh surface to nearby 3D space.
arXiv Detail & Related papers (2023-05-31T17:59:26Z) - Visual-Tactile Sensing for In-Hand Object Reconstruction [38.42487660352112]
We propose a visual-tactile in-hand object reconstruction framework textbfVTacO, and extend it to textbfVTacOH for hand-object reconstruction.
A simulation environment, VT-Sim, supports generating hand-object interaction for both rigid and deformable objects.
arXiv Detail & Related papers (2023-03-25T15:16:31Z) - S$^2$Contact: Graph-based Network for 3D Hand-Object Contact Estimation
with Semi-Supervised Learning [70.72037296392642]
We propose a novel semi-supervised framework that allows us to learn contact from monocular images.
Specifically, we leverage visual and geometric consistency constraints in large-scale datasets for generating pseudo-labels.
We show benefits from using a contact map that rules hand-object interactions to produce more accurate reconstructions.
arXiv Detail & Related papers (2022-08-01T14:05:23Z) - Stability-driven Contact Reconstruction From Monocular Color Images [7.427212296770506]
Physical contact provides additional constraints for hand-object state reconstruction.
Existing methods optimize the hand-object contact driven by distance threshold or prior from contact-labeled datasets.
Our key idea is to reconstruct the contact pattern directly from monocular images, and then utilize the physical stability criterion in the simulation to optimize it.
arXiv Detail & Related papers (2022-05-02T12:23:06Z) - Tac2Pose: Tactile Object Pose Estimation from the First Touch [6.321662423735226]
We present Tac2Pose, an object-specific approach to tactile pose estimation from the first touch for known objects.
We simulate the contact shapes that a dense set of object poses would produce on the sensor.
We obtain contact shapes from the sensor with an object-agnostic calibration step that maps RGB tactile observations to binary contact shapes.
arXiv Detail & Related papers (2022-04-25T14:43:48Z) - Elastic Tactile Simulation Towards Tactile-Visual Perception [58.44106915440858]
We propose Elastic Interaction of Particles (EIP) for tactile simulation.
EIP models the tactile sensor as a group of coordinated particles, and the elastic property is applied to regulate the deformation of particles during contact.
We further propose a tactile-visual perception network that enables information fusion between tactile data and visual images.
arXiv Detail & Related papers (2021-08-11T03:49:59Z) - Tactile Object Pose Estimation from the First Touch with Geometric
Contact Rendering [19.69677059281393]
We present an approach to tactile pose estimation from the first touch for known objects.
We create an object-agnostic map from real tactile observations to contact shapes.
For a new object with known geometry, we learn a tailored perception model completely in simulation.
arXiv Detail & Related papers (2020-12-09T18:00:35Z) - Physics-Based Dexterous Manipulations with Estimated Hand Poses and
Residual Reinforcement Learning [52.37106940303246]
We learn a model that maps noisy input hand poses to target virtual poses.
The agent is trained in a residual setting by using a model-free hybrid RL+IL approach.
We test our framework in two applications that use hand pose estimates for dexterous manipulations: hand-object interactions in VR and hand-object motion reconstruction in-the-wild.
arXiv Detail & Related papers (2020-08-07T17:34:28Z) - Transferable Active Grasping and Real Embodied Dataset [48.887567134129306]
We show how to search for feasible viewpoints for grasping by the use of hand-mounted RGB-D cameras.
A practical 3-stage transferable active grasping pipeline is developed, that is adaptive to unseen clutter scenes.
In our pipeline, we propose a novel mask-guided reward to overcome the sparse reward issue in grasping and ensure category-irrelevant behavior.
arXiv Detail & Related papers (2020-04-28T08:15:35Z) - Learning the sense of touch in simulation: a sim-to-real strategy for
vision-based tactile sensing [1.9981375888949469]
This paper focuses on a vision-based tactile sensor, which aims to reconstruct the distribution of the three-dimensional contact forces applied on its soft surface.
A strategy is proposed to train a tailored deep neural network entirely from the simulation data.
The resulting learning architecture is directly transferable across multiple tactile sensors without further training and yields accurate predictions on real data.
arXiv Detail & Related papers (2020-03-05T14:17:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.