Shape-Interpretable Visual Self-Modeling Enables Geometry-Aware Continuum Robot Control
- URL: http://arxiv.org/abs/2603.01751v1
- Date: Mon, 02 Mar 2026 11:20:28 GMT
- Title: Shape-Interpretable Visual Self-Modeling Enables Geometry-Aware Continuum Robot Control
- Authors: Peng Yu, Xin Wang, Ning Tan,
- Abstract summary: Continuum robots possess high flexibility and redundancy, making them well suited for safe interaction in complex environments.<n>Existing vision-based control approaches often rely on end-to-end learning, achieving shape regulation without explicit awareness of robot geometry.<n>Here, we introduce a shape-interpretable visual self-modeling framework for continuum robots that enables geometry-aware control.
- Score: 10.253290204273094
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continuum robots possess high flexibility and redundancy, making them well suited for safe interaction in complex environments, yet their continuous deformation and nonlinear dynamics pose fundamental challenges to perception, modeling, and control. Existing vision-based control approaches often rely on end-to-end learning, achieving shape regulation without explicit awareness of robot geometry or its interaction with the environment. Here, we introduce a shape-interpretable visual self-modeling framework for continuum robots that enables geometry-aware control. Robot shapes are encoded from multi-view planar images using a Bezier-curve representation, transforming visual observations into a compact and physically meaningful shape space that uniquely characterizes the robot's three-dimensional configuration. Based on this representation, neural ordinary differential equations are employed to self-model both shape and end-effector dynamics directly from data, enabling hybrid shape-position control without analytical models or dense body markers. The explicit geometric structure of the learned shape space allows the robot to reason about its body and surroundings, supporting environment-aware behaviors such as obstacle avoidance and self-motion while maintaining end-effector objectives. Experiments on a cable-driven continuum robot demonstrate accurate shape-position regulation and tracking, with shape errors within 1.56% of image resolution and end-effector errors within 2% of robot length, as well as robust performance in constrained environments. By elevating visual shape representations from two-dimensional observations to an interpretable three-dimensional self-model, this work establishes a principled alternative to vision-based end-to-end control and advances autonomous, geometry-aware manipulation for continuum robots.
Related papers
- Towards Learning a Generalizable 3D Scene Representation from 2D Observations [7.434862537620824]
We introduce a Generalizable Neural Radiance Field approach for predicting 3D workspace occupancy from egocentric robot observations.<n>Our model constructs occupancy representations in a global workspace frame, making it directly applicable to robotic manipulation.
arXiv Detail & Related papers (2026-02-11T15:22:41Z) - ArtReg: Visuo-Tactile based Pose Tracking and Manipulation of Unseen Articulated Objects [2.9793019246605676]
We present a novel method for visuo-tactile-based tracking of unseen objects.<n>Our approach integrates visuo-tactile point clouds in an unscented Kalman Filter formulation.<n>We have extensively evaluated our approach on various types of unknown objects through real robot experiments.
arXiv Detail & Related papers (2025-11-09T13:30:51Z) - DynaRend: Learning 3D Dynamics via Masked Future Rendering for Robotic Manipulation [52.136378691610524]
We present DynaRend, a representation learning framework that learns 3D-aware and dynamics-informed triplane features.<n>By pretraining on multi-view RGB-D video data, DynaRend jointly captures spatial geometry, future dynamics, and task semantics in a unified triplane representation.<n>We evaluate DynaRend on two challenging benchmarks, RLBench and Colosseum, demonstrating substantial improvements in policy success rate, generalization to environmental perturbations, and real-world applicability across diverse manipulation tasks.
arXiv Detail & Related papers (2025-10-28T10:17:11Z) - Geometry-aware 4D Video Generation for Robot Manipulation [28.709339959536106]
We propose a 4D video generation model that enforces multi-view 3D consistency of videos by supervising the model with cross-view pointmap alignment during training.<n>This geometric supervision enables the model to learn a shared 3D representation of the scene, allowing it to predict future video sequences from novel viewpoints.<n>Compared to existing baselines, our method produces more visually stable and spatially aligned predictions across multiple simulated and real-world robotic datasets.
arXiv Detail & Related papers (2025-07-01T18:01:41Z) - Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control [72.00655365269]
We present RoboMaster, a novel framework that models inter-object dynamics through a collaborative trajectory formulation.<n>Unlike prior methods that decompose objects, our core is to decompose the interaction process into three sub-stages: pre-interaction, interaction, and post-interaction.<n>Our method outperforms existing approaches, establishing new state-of-the-art performance in trajectory-controlled video generation for robotic manipulation.
arXiv Detail & Related papers (2025-06-02T17:57:06Z) - Is Single-View Mesh Reconstruction Ready for Robotics? [78.14584238127338]
We evaluate single-view mesh reconstruction models for their potential in enabling instant digital twin creation for real-time planning and dynamics prediction using physics simulators for robotic manipulation.<n>Our findings highlight critical gaps between computer vision advances and robotics needs, guiding future research at this intersection.
arXiv Detail & Related papers (2025-05-23T14:35:56Z) - Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning.<n>voxelization infers per-object occupancy probabilities at individual spatial locations.<n>Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z) - Robust Robotic Control from Pixels using Contrastive Recurrent
State-Space Models [8.22669535053079]
We study how to learn world models in unconstrained environments over high-dimensional observation spaces such as images.
One source of difficulty is the presence of irrelevant but hard-to-model background distractions.
We learn a recurrent latent dynamics model which contrastively predicts the next observation.
This simple model leads to surprisingly robust robotic control even with simultaneous camera, background, and color distractions.
arXiv Detail & Related papers (2021-12-02T12:15:25Z) - Learning Visual Shape Control of Novel 3D Deformable Objects from
Partial-View Point Clouds [7.1659268120093635]
Analytic models of elastic, 3D deformable objects require numerous parameters to describe the potentially infinite degrees of freedom present in determining the object's shape.
Previous attempts at performing 3D shape control rely on hand-crafted features to represent the object shape and require training of object-specific control models.
We overcome these issues through the use of our novel DeformerNet neural network architecture, which operates on a partial-view point cloud of the object being manipulated and a point cloud of the goal shape to learn a low-dimensional representation of the object shape.
arXiv Detail & Related papers (2021-10-10T02:34:57Z) - 3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations.
A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z) - Nothing But Geometric Constraints: A Model-Free Method for Articulated
Object Pose Estimation [89.82169646672872]
We propose an unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori.
We combine a classical geometric formulation with deep learning and extend the use of epipolar multi-rigid-body constraints to solve this task.
arXiv Detail & Related papers (2020-11-30T20:46:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.