GRASPLAT: Enabling dexterous grasping through novel view synthesis
- URL: http://arxiv.org/abs/2510.19200v1
- Date: Wed, 22 Oct 2025 03:19:26 GMT
- Title: GRASPLAT: Enabling dexterous grasping through novel view synthesis
- Authors: Matteo Bortolon, Nuno Ferreira Duarte, Plinio Moreno, Fabio Poiesi, José Santos-Victor, Alessio Del Bue,
- Abstract summary: GRASPLAT is a novel grasping framework that leverages consistent 3D information while being trained solely on RGB images.<n>Our key insight is that by synthesizing physically plausible images of a hand grasping an object, we can regress the corresponding hand joints for a successful grasp.<n>Unlike prior methods, our approach incorporates a photometric loss that refines grasp predictions by minimizing discrepancies between rendered and real images.
- Score: 29.55492442334601
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Achieving dexterous robotic grasping with multi-fingered hands remains a significant challenge. While existing methods rely on complete 3D scans to predict grasp poses, these approaches face limitations due to the difficulty of acquiring high-quality 3D data in real-world scenarios. In this paper, we introduce GRASPLAT, a novel grasping framework that leverages consistent 3D information while being trained solely on RGB images. Our key insight is that by synthesizing physically plausible images of a hand grasping an object, we can regress the corresponding hand joints for a successful grasp. To achieve this, we utilize 3D Gaussian Splatting to generate high-fidelity novel views of real hand-object interactions, enabling end-to-end training with RGB data. Unlike prior methods, our approach incorporates a photometric loss that refines grasp predictions by minimizing discrepancies between rendered and real images. We conduct extensive experiments on both synthetic and real-world grasping datasets, demonstrating that GRASPLAT improves grasp success rates up to 36.9% over existing image-based methods. Project page: https://mbortolon97.github.io/grasplat/
Related papers
- EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.<n>We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z) - JGHand: Joint-Driven Animatable Hand Avater via 3D Gaussian Splatting [3.1143479095236892]
Jointly 3D Gaussian Hand (JGHand) is a novel joint-driven 3D Gaussian Splatting (3DGS)-based hand representation.<n>We show that JGHand achieves real-time rendering speeds with enhanced quality, surpassing state-of-the-art methods.
arXiv Detail & Related papers (2025-01-31T12:33:24Z) - HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object Interactions [68.28684509445529]
We present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance.
First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds.
Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set.
arXiv Detail & Related papers (2024-03-27T13:56:08Z) - IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues.
Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images.
For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z) - Real-Time Radiance Fields for Single-Image Portrait View Synthesis [85.32826349697972]
We present a one-shot method to infer and render a 3D representation from a single unposed image in real-time.
Given a single RGB input, our image encoder directly predicts a canonical triplane representation of a neural radiance field for 3D-aware novel view synthesis via volume rendering.
Our method is fast (24 fps) on consumer hardware, and produces higher quality results than strong GAN-inversion baselines that require test-time optimization.
arXiv Detail & Related papers (2023-05-03T17:56:01Z) - A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware
Image Synthesis [163.96778522283967]
We propose a shading-guided generative implicit model that is able to learn a starkly improved shape representation.
An accurate 3D shape should also yield a realistic rendering under different lighting conditions.
Our experiments on multiple datasets show that the proposed approach achieves photorealistic 3D-aware image synthesis.
arXiv Detail & Related papers (2021-10-29T10:53:12Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.