Related papers: CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions

CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions

URL: http://arxiv.org/abs/2602.01844v1
Date: Mon, 02 Feb 2026 09:16:16 GMT
Title: CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions
Authors: Yuliang Zhan, Jian Li, Wenbing Huang, Wenbing Huang, Yang Liu, Hao Sun,
Abstract summary: We introduce Cloth Dynamics Grounding (CDG), a novel scenario for unsupervised learning of cloth dynamics from multi-view visual observations.<n>We propose Cloth Dynamics Splatting (CloDS), an unsupervised dynamic learning framework designed for CDG.<n>CloDS adopts a three-stage pipeline that first performs video-to-geometry grounding and then trains a dynamics model on the grounded meshes.
Score: 36.41201675940166
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning has demonstrated remarkable capabilities in simulating complex dynamic systems. However, existing methods require known physical properties as supervision or inputs, limiting their applicability under unknown conditions. To explore this challenge, we introduce Cloth Dynamics Grounding (CDG), a novel scenario for unsupervised learning of cloth dynamics from multi-view visual observations. We further propose Cloth Dynamics Splatting (CloDS), an unsupervised dynamic learning framework designed for CDG. CloDS adopts a three-stage pipeline that first performs video-to-geometry grounding and then trains a dynamics model on the grounded meshes. To cope with large non-linear deformations and severe self-occlusions during grounding, we introduce a dual-position opacity modulation that supports bidirectional mapping between 2D observations and 3D geometry via mesh-based Gaussian splatting in video-to-geometry grounding stage. It jointly considers the absolute and relative position of Gaussian components. Comprehensive experimental evaluations demonstrate that CloDS effectively learns cloth dynamics from visual data while maintaining strong generalization capabilities for unseen configurations. Our code is available at https://github.com/whynot-zyl/CloDS. Visualization results are available at https://github.com/whynot-zyl/CloDS_video}.%\footnote{As in this example.

Related papers

GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation [57.8059956428009]
Recent attempts to transfer features from 2D Vision-Language Models to 3D semantic segmentation expose a persistent trade-off.<n>We propose GeoPurify that applies a small Student Affinity Network to 2D VLM-generated 3D point features using geometric priors distilled from a 3D self-supervised teacher model.<n>Benefiting from latent geometric information and the learned affinity network, GeoPurify effectively mitigates the trade-off and achieves superior data efficiency.
arXiv Detail & Related papers (2025-10-02T16:37:56Z)
Provable Scaling Laws of Feature Emergence from Learning Dynamics of Grokking [44.614763110719274]
We study the phenomenon of grokking, i.e., delayed generalization.<n>We propose a novel framework that captures three key stages for the grokking behavior of 2-layer nonlinear networks.<n>Our study sheds lights on roles played by hypers such as weight decay, learning rate and sample sizes in grokking.
arXiv Detail & Related papers (2025-09-25T20:08:09Z)
Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation [54.04601077224252]
Embodied scene understanding requires not only comprehending visual-spatial information but also determining where to explore next in the 3D physical world.<n>underlinetextbf3D vision-language learning enables embodied agents to effectively explore and understand their environment.<n>model's versatility enables navigation using diverse input modalities, including categories, language descriptions, and reference images.
arXiv Detail & Related papers (2025-07-05T14:15:52Z)
ODG: Occupancy Prediction Using Dual Gaussians [38.9869091446875]
Occupancy prediction infers fine-grained 3D geometry and semantics from camera images of the surrounding environment.<n>Existing methods either adopt dense grids as scene representation, or learn the entire scene using a single set of sparse queries.<n>We present ODG, a hierarchical dual sparse Gaussian representation to effectively capture complex scene dynamics.
arXiv Detail & Related papers (2025-06-11T06:03:03Z)
VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction [46.31516096522758]
Recent advancements in camera-based occupancy prediction have focused on the simultaneous prediction of 3D semantics and scene flow.<n>We propose a novel regularization framework called VoxelSplat to address these challenges and their underlying causes.<n>Our framework uses the predicted scene flow to model the motion of Gaussians, and is thus able to learn the scene flow of moving objects in a self-supervised manner.
arXiv Detail & Related papers (2025-06-05T20:19:35Z)
DSG-World: Learning a 3D Gaussian World Model from Dual State Videos [14.213608866611784]
We present DSG-World, a novel end-to-end framework that explicitly constructs a 3D Gaussian World model from Dual State observations.<n>Our approach builds dual segmentation-aware Gaussian fields and enforces bidirectional photometric and semantic consistency.
arXiv Detail & Related papers (2025-06-05T16:33:32Z)
A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We present a novel framework for training 3D image-conditioned diffusion models using only 2D supervision.<n>Most existing 3D generative models rely on full 3D supervision, which is impractical due to the scarcity of large-scale 3D datasets.
arXiv Detail & Related papers (2024-12-01T00:29:57Z)
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes [71.61083731844282]
We present DeSiRe-GS, a self-supervised gaussian splatting representation.<n>It enables effective static-dynamic decomposition and high-fidelity surface reconstruction in complex driving scenarios.
arXiv Detail & Related papers (2024-11-18T05:49:16Z)
Space-time 2D Gaussian Splatting for Accurate Surface Reconstruction under Complex Dynamic Scenes [30.32214593068206]
We present a space-time 2D Gaussian Splatting approach to tackle the dynamic contents and the occlusions in complex scenes. Specifically, to improve geometric quality in dynamic scenes, we learn canonical 2D Gaussian splats and deform these 2D Gaussian splats. We also introduce a compositional opacity strategy, which further reduces the surface recovery of those occluded areas. Experiments on real-world sparse-view video datasets and monocular dynamic datasets demonstrate that our reconstructions outperform state-of-the-art methods.
arXiv Detail & Related papers (2024-09-27T15:50:36Z)
Probing the 3D Awareness of Visual Foundation Models [56.68380136809413]
We analyze the 3D awareness of visual foundation models. We conduct experiments using task-specific probes and zero-shot inference procedures on frozen features.
arXiv Detail & Related papers (2024-04-12T17:58:04Z)
HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting [53.6394928681237]
holistic understanding of urban scenes based on RGB images is a challenging yet important problem. Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians. Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy.
arXiv Detail & Related papers (2024-03-19T13:39:05Z)
DynaVol: Unsupervised Learning for Dynamic Scenes through Object-Centric Voxelization [67.85434518679382]
We present DynaVol, a 3D scene generative model that unifies geometric structures and object-centric learning. The key idea is to perform object-centric voxelization to capture the 3D nature of the scene. voxel features evolve over time through a canonical-space deformation function, forming the basis for global representation learning.
arXiv Detail & Related papers (2023-04-30T05:29:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.