Related papers: SignSplat: Rendering Sign Language via Gaussian Splatting

SignSplat: Rendering Sign Language via Gaussian Splatting

URL: http://arxiv.org/abs/2505.02108v1
Date: Sun, 04 May 2025 13:28:49 GMT
Title: SignSplat: Rendering Sign Language via Gaussian Splatting
Authors: Maksym Ivashechkin, Oscar Mendez, Richard Bowden,
Abstract summary: State-of-the-art approaches for conditional human body rendering via Gaussian splatting typically focus on simple body motions captured from many views.<n>For more complex use cases, such as sign language, we care less about large body motion and more about subtle and complex motions of the hands and face.<n>We focus on how to achieve this, constraining mesh parameters to build an accurate Gaussian splatting framework from few views capable of modelling subtle human motion.
Score: 33.9893684177763
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: State-of-the-art approaches for conditional human body rendering via Gaussian splatting typically focus on simple body motions captured from many views. This is often in the context of dancing or walking. However, for more complex use cases, such as sign language, we care less about large body motion and more about subtle and complex motions of the hands and face. The problems of building high fidelity models are compounded by the complexity of capturing multi-view data of sign. The solution is to make better use of sequence data, ensuring that we can overcome the limited information from only a few views by exploiting temporal variability. Nevertheless, learning from sequence-level data requires extremely accurate and consistent model fitting to ensure that appearance is consistent across complex motions. We focus on how to achieve this, constraining mesh parameters to build an accurate Gaussian splatting framework from few views capable of modelling subtle human motion. We leverage regularization techniques on the Gaussian parameters to mitigate overfitting and rendering artifacts. Additionally, we propose a new adaptive control method to densify Gaussians and prune splat points on the mesh surface. To demonstrate the accuracy of our approach, we render novel sequences of sign language video, building on neural machine translation approaches to sign stitching. On benchmark datasets, our approach achieves state-of-the-art performance; and on highly articulated and complex sign language motion, we significantly outperform competing approaches.

Related papers

HoliGS: Holistic Gaussian Splatting for Embodied View Synthesis [59.25751939710903]
We propose a novel deformable Gaussian splatting framework that addresses embodied view synthesis from long monocular RGB videos.<n>Our method leverages invertible Gaussian Splatting deformation networks to reconstruct large-scale, dynamic environments accurately.<n>Results highlight a practical and scalable solution for EVS in real-world scenarios.
arXiv Detail & Related papers (2025-06-24T03:54:40Z)
MAMMA: Markerless & Automatic Multi-Person Motion Action Capture [37.06717786024836]
MAMMA is a markerless motion-capture pipeline that recovers SMPL-X parameters from multi-view video of two-person interaction sequences.<n>We introduce a method that predicts dense 2D surface landmarks conditioned on segmentation masks.<n>We demonstrate that our approach can handle complex person--person interaction and offers greater accuracy than existing methods.
arXiv Detail & Related papers (2025-06-16T02:04:51Z)
AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views [57.13066710710485]
AnySplat is a feed forward network for novel view synthesis from uncalibrated image collections.<n>A single forward pass yields a set of 3D Gaussian primitives encoding both scene geometry and appearance.<n>In extensive zero shot evaluations, AnySplat matches the quality of pose aware baselines in both sparse and dense view scenarios.
arXiv Detail & Related papers (2025-05-29T17:49:56Z)
Controlling Avatar Diffusion with Learnable Gaussian Embedding [27.651478116386354]
We introduce a novel control signal representation that is optimizable, dense, expressive, and 3D consistent.<n>We synthesize a large-scale dataset with multiple poses and identities.<n>Our model outperforms existing methods in terms of realism, expressiveness, and 3D consistency.
arXiv Detail & Related papers (2025-03-20T02:52:01Z)
SwiftSketch: A Diffusion Model for Image-to-Vector Sketch Generation [57.47730473674261]
We introduce SwiftSketch, a model for image-conditioned vector sketch generation that can produce high-quality sketches in less than a second.<n>SwiftSketch operates by progressively denoising stroke control points sampled from a Gaussian distribution.<n>ControlSketch is a method that enhances SDS-based techniques by incorporating precise spatial control through a depth-aware ControlNet.
arXiv Detail & Related papers (2025-02-12T18:57:12Z)
Monocular Dynamic Gaussian Splatting is Fast and Brittle but Smooth Motion Helps [14.35885714606969]
We organize, benchmark, and analyze many Gaussian-splatting-based methods.<n>We quantify how their differences impact performance.<n>Fast rendering speed of all Gaussian-based methods comes at the cost of brittleness in optimization.
arXiv Detail & Related papers (2024-12-05T18:59:08Z)
Occam's LGS: An Efficient Approach for Language Gaussian Splatting [57.00354758206751]
We show that the complicated pipelines for language 3D Gaussian Splatting are simply unnecessary.<n>We apply Occam's razor to the task at hand, leading to a highly efficient weighted multi-view feature aggregation technique.
arXiv Detail & Related papers (2024-12-02T18:50:37Z)
GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views [67.34073368933814]
We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting. We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.
arXiv Detail & Related papers (2024-11-18T08:18:44Z)
SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes [59.23385953161328]
Novel view synthesis for dynamic scenes is still a challenging problem in computer vision and graphics. We propose a new representation that explicitly decomposes the motion and appearance of dynamic scenes into sparse control points and dense Gaussians. Our method can enable user-controlled motion editing while retaining high-fidelity appearances.
arXiv Detail & Related papers (2023-12-04T11:57:14Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
Neural Face Models for Example-Based Visual Speech Synthesis [2.2817442144155207]
We present a marker-less approach for facial motion capture based on multi-view video. We learn a neural representation of facial expressions, which is used to seamlessly facial performances during the animation procedure.
arXiv Detail & Related papers (2020-09-22T07:35:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.