DYNAMO: Dependency-Aware Deep Learning Framework for Articulated Assembly Motion Prediction
- URL: http://arxiv.org/abs/2509.12430v1
- Date: Mon, 15 Sep 2025 20:32:56 GMT
- Title: DYNAMO: Dependency-Aware Deep Learning Framework for Articulated Assembly Motion Prediction
- Authors: Mayank Patel, Rahul Jain, Asim Unmesh, Karthik Ramani,
- Abstract summary: We introduce MechBench, a benchmark dataset of 693 diverse synthetic gear assemblies with part-wise ground-truth motion trajectories.<n>We propose DYNAMO, a dependency-aware neural model that predicts per-part SE(3) motion trajectories directly from segmented CAD point clouds.<n>Together, MechBench and DYNAMO establish a novel systematic framework for data-driven learning of coupled mechanical motion in CAD assemblies.
- Score: 12.882634043776507
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Understanding the motion of articulated mechanical assemblies from static geometry remains a core challenge in 3D perception and design automation. Prior work on everyday articulated objects such as doors and laptops typically assumes simplified kinematic structures or relies on joint annotations. However, in mechanical assemblies like gears, motion arises from geometric coupling, through meshing teeth or aligned axes, making it difficult for existing methods to reason about relational motion from geometry alone. To address this gap, we introduce MechBench, a benchmark dataset of 693 diverse synthetic gear assemblies with part-wise ground-truth motion trajectories. MechBench provides a structured setting to study coupled motion, where part dynamics are induced by contact and transmission rather than predefined joints. Building on this, we propose DYNAMO, a dependency-aware neural model that predicts per-part SE(3) motion trajectories directly from segmented CAD point clouds. Experiments show that DYNAMO outperforms strong baselines, achieving accurate and temporally consistent predictions across varied gear configurations. Together, MechBench and DYNAMO establish a novel systematic framework for data-driven learning of coupled mechanical motion in CAD assemblies.
Related papers
- ArtLLM: Generating Articulated Assets via 3D LLM [19.814132638278547]
ArtLLM is a novel framework for generating high-quality articulated assets directly from complete 3D meshes.<n>At its core is a 3D multimodal large language model trained on a large-scale articulation dataset.<n> Experiments show that ArtLLM significantly outperforms state-of-the-art methods in both part layout accuracy and joint prediction.
arXiv Detail & Related papers (2026-03-01T15:07:46Z) - From Perception to Action: An Interactive Benchmark for Vision Reasoning [51.11355591375073]
Causal Hierarchy of Actions and Interactions (CHAIN) benchmark designed to evaluate whether models can understand, plan, and execute structured action sequences grounded in physical constraints.<n> CHAIN shifts evaluation from passive perception to active problem solving, spanning tasks such as interlocking mechanical puzzles and 3D stacking and packing.<n>Our results show that top-performing models still struggle to internalize physical structure and causal constraints, often failing to produce reliable long-horizon plans and cannot robustly translate perceived structure into effective actions.
arXiv Detail & Related papers (2026-02-24T15:33:02Z) - Articulated 3D Scene Graphs for Open-World Mobile Manipulation [55.97942733699124]
We present MoMa-SG, a framework for building semantic-kinematic 3D scene graphs of articulated scenes.<n>We estimate articulation models using a novel unified twist estimation formulation.<n>We also introduce the novel Arti4D-Semantic dataset.
arXiv Detail & Related papers (2026-02-18T10:40:35Z) - MeshMimic: Geometry-Aware Humanoid Motion Learning through 3D Scene Reconstruction [54.36564144414704]
MeshMimic is an innovative framework that bridges 3D scene reconstruction and embodied intelligence to enable humanoid robots to learn coupled "motion-terrain" interactions directly from video.<n>By leveraging state-of-the-art 3D vision models, our framework precisely segments and reconstructs both human trajectories and the underlying 3D geometry of terrains and objects.
arXiv Detail & Related papers (2026-02-17T17:09:45Z) - PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement [89.35154754765502]
PhyScensis is an agent-based framework powered by a physics engine to produce physically plausible scene configurations.<n>Our framework preserves strong controllability over fine-grained textual descriptions and numerical parameters.<n> Experimental results show that our method outperforms prior approaches in scene complexity, visual quality, and physical accuracy.
arXiv Detail & Related papers (2026-02-16T17:55:25Z) - PAct: Part-Decomposed Single-View Articulated Object Generation [45.04652409374895]
Articulated objects are central to interactive 3D applications, including embodied AI, robotics, and VR/AR.<n>We introduce a part-centric generative framework for articulated object creation that synthesizes part geometry, composition, and articulation under explicit part-aware conditioning.<n>Our representation models an object as a set of movable parts, each encoded by latent tokens augmented with part identity and articulation cues.
arXiv Detail & Related papers (2026-02-16T17:45:44Z) - DragMesh: Interactive 3D Generation Made Easy [12.832539752284466]
DragMesh is a robust framework for real-time interactive 3D articulation.<n>Our core contribution is a novel decoupled kinematic reasoning and motion generation framework.
arXiv Detail & Related papers (2025-12-06T13:10:44Z) - Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects [59.51185639557874]
We introduce Kinematify, an automated framework that synthesizes articulated objects directly from arbitrary RGB images or textual descriptions.<n>Our method addresses two core challenges: (i) inferring kinematic topologies for high-DoF objects and (ii) estimating joint parameters from static geometry.
arXiv Detail & Related papers (2025-11-03T07:21:42Z) - Geometric Neural Distance Fields for Learning Human Motion Priors [51.99890740169883]
We introduce a novel 3D generative human motion prior that enables robust, temporally consistent, and physically plausible 3D motion recovery.<n>Our experiments show significant and consistent gains: trained on the AMASS dataset, NRMF remarkably generalizes across multiple input modalities.
arXiv Detail & Related papers (2025-09-11T17:58:18Z) - A Structure-aware and Motion-adaptive Framework for 3D Human Pose Estimation with Mamba [18.376143217023934]
We propose a structure-aware and motion-adaptive framework to capture spatial joint topology.<n>Through the above key modules, our algorithm enables structure-aware and motion-adaptive pose lifting.
arXiv Detail & Related papers (2025-07-26T07:59:52Z) - GGMotion: Group Graph Dynamics-Kinematics Networks for Human Motion Prediction [0.0]
GGMotion is a group graph dynamics-kinematics network that models human topology in groups to better leverage dynamics and kinematics priors.<n>Inter-group and intra-group interaction modules are employed to capture the dependencies of joints at different scales.<n>Our approach achieves a significant performance margin in short-term motion prediction.
arXiv Detail & Related papers (2025-07-10T08:02:01Z) - IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction.<n>We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images.<n>We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z) - M3Bench: Benchmarking Whole-body Motion Generation for Mobile Manipulation in 3D Scenes [66.44171200767839]
M3Bench is a new benchmark for whole-body motion generation in mobile manipulation tasks.<n>M3Bench features 30,000 object rearrangement tasks across 119 diverse scenes.<n>M3Bench and M3BenchMaker aim to advance robotics research toward more adaptive and capable mobile manipulation.
arXiv Detail & Related papers (2024-10-09T08:38:21Z) - Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories [28.701879490459675]
We aim to learn an implicit motion field parameterized by a neural network to predict the movement of novel points within same domain.
We exploit intrinsic regularization provided by SIREN, and modify the input layer to produce atemporally smooth motion field.
Our experiments assess the model's performance in predicting unseen point trajectories and its application in temporal mesh alignment with deformation.
arXiv Detail & Related papers (2024-06-05T21:02:10Z) - Physics-Encoded Graph Neural Networks for Deformation Prediction under
Contact [87.69278096528156]
In robotics, it's crucial to understand object deformation during tactile interactions.
We introduce a method using Physics-Encoded Graph Neural Networks (GNNs) for such predictions.
We've made our code and dataset public to advance research in robotic simulation and grasping.
arXiv Detail & Related papers (2024-02-05T19:21:52Z) - Structure from Action: Learning Interactions for Articulated Object 3D
Structure Discovery [18.96346371296251]
We introduce Structure from Action (SfA), a framework to discover 3D part geometry and joint parameters of unseen articulated objects.
By selecting informative interactions, SfA discovers parts and reveals occluded surfaces, like the inside of a closed drawer.
Empirically, SfA outperforms a pipeline of state-of-the-art components by 25.4 3D IoU percentage points on unseen categories.
arXiv Detail & Related papers (2022-07-19T00:27:36Z) - Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model
Alignments [81.38641691636847]
We rethink the problem of scene reconstruction from an embodied agent's perspective.
We reconstruct an interactive scene using RGB-D data stream.
This reconstructed scene replaces the object meshes in the dense panoptic map with part-based articulated CAD models.
arXiv Detail & Related papers (2021-03-30T05:56:58Z) - Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection
Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.