Consolidating Kinematic Models to Promote Coordinated Mobile
Manipulations
- URL: http://arxiv.org/abs/2108.01264v3
- Date: Tue, 10 Aug 2021 07:57:11 GMT
- Title: Consolidating Kinematic Models to Promote Coordinated Mobile
Manipulations
- Authors: Ziyuan Jiao, Zeyu Zhang, Xin Jiang, David Han, Song-Chun Zhu, Yixin
Zhu, Hangxin Liu
- Abstract summary: We construct a Virtual Kinematic Chain (VKC) that consolidates the kinematics of the mobile base, the arm, and the object to be manipulated in mobile manipulations.
A mobile manipulation task is represented by altering the state of the constructed VKC, which can be converted to a motion planning problem.
- Score: 96.03270112422514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We construct a Virtual Kinematic Chain (VKC) that readily consolidates the
kinematics of the mobile base, the arm, and the object to be manipulated in
mobile manipulations. Accordingly, a mobile manipulation task is represented by
altering the state of the constructed VKC, which can be converted to a motion
planning problem, formulated, and solved by trajectory optimization. This new
VKC perspective of mobile manipulation allows a service robot to (i) produce
well-coordinated motions, suitable for complex household environments, and (ii)
perform intricate multi-step tasks while interacting with multiple objects
without an explicit definition of intermediate goals. In simulated experiments,
we validate these advantages by comparing the VKC-based approach with baselines
that solely optimize individual components. The results manifest that VKC-based
joint modeling and planning promote task success rates and produce more
efficient trajectories.
Related papers
- From Perception to Action: An Interactive Benchmark for Vision Reasoning [51.11355591375073]
Causal Hierarchy of Actions and Interactions (CHAIN) benchmark designed to evaluate whether models can understand, plan, and execute structured action sequences grounded in physical constraints.<n> CHAIN shifts evaluation from passive perception to active problem solving, spanning tasks such as interlocking mechanical puzzles and 3D stacking and packing.<n>Our results show that top-performing models still struggle to internalize physical structure and causal constraints, often failing to produce reliable long-horizon plans and cannot robustly translate perceived structure into effective actions.
arXiv Detail & Related papers (2026-02-24T15:33:02Z) - PosA-VLA: Enhancing Action Generation via Pose-Conditioned Anchor Attention [92.85371254435074]
PosA-VLA framework anchors visual attention via pose-conditioned supervision, consistently guiding the model's perception toward task-relevant regions.<n>We show that our method executes embodied tasks with precise and time-efficient behavior across diverse robotic manipulation benchmarks.
arXiv Detail & Related papers (2025-12-03T12:14:29Z) - MotionVerse: A Unified Multimodal Framework for Motion Comprehension, Generation and Editing [53.98607267063729]
MotionVerse is a framework to comprehend, generate, and edit human motion in both single-person and multi-person scenarios.<n>We employ a motion tokenizer with residual quantization, which converts continuous motion sequences into multi-stream discrete tokens.<n>We also introduce a textitDelay Parallel Modeling strategy, which temporally staggers the encoding of residual token streams.
arXiv Detail & Related papers (2025-09-28T04:20:56Z) - AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation [31.314066269767057]
Mobile manipulation has attracted increasing attention for enabling language-conditioned robotic control in household tasks.<n>Existing methods fail to explicitly model the influence of the mobile base on manipulator control.<n>We propose the Adaptive Coordination Diffusion Transformer (AC-DiT) to enhance mobile base and manipulator coordination.
arXiv Detail & Related papers (2025-07-02T17:59:54Z) - SILK: Smooth InterpoLation frameworK for motion in-betweening A Simplified Computational Approach [1.7812314225208412]
Motion in-betweening is a crucial tool for animators, enabling control over pose-level details in each pose.<n>Recent machine learning solutions for motion in-betweening rely on complex models, skeleton-aware architectures or requiring multiple modules and training steps.<n>We introduce a simple yet effective Transformer-based framework, employing a single encoder to synthesize realistic motions for motion in-betweening tasks.
arXiv Detail & Related papers (2025-06-09T19:26:27Z) - Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control [72.00655365269]
We present RoboMaster, a novel framework that models inter-object dynamics through a collaborative trajectory formulation.<n>Unlike prior methods that decompose objects, our core is to decompose the interaction process into three sub-stages: pre-interaction, interaction, and post-interaction.<n>Our method outperforms existing approaches, establishing new state-of-the-art performance in trajectory-controlled video generation for robotic manipulation.
arXiv Detail & Related papers (2025-06-02T17:57:06Z) - Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy [56.424032454461695]
We present Dita, a scalable framework that leverages Transformer architectures to directly denoise continuous action sequences.
Dita employs in-context conditioning -- enabling fine-grained alignment between denoised actions and raw visual tokens from historical observations.
Dita effectively integrates cross-embodiment datasets across diverse camera perspectives, observation scenes, tasks, and action spaces.
arXiv Detail & Related papers (2025-03-25T15:19:56Z) - ManipGPT: Is Affordance Segmentation by Large Vision Models Enough for Articulated Object Manipulation? [17.356760351203715]
This paper introduces ManipGPT, a framework designed to predict optimal interaction areas for articulated objects.
We created a dataset of 9.9k simulated and real images to bridge the sim-to-real gap.
We significantly improved part-level affordance segmentation, adapting the model's in-context segmentation capabilities to robot manipulation scenarios.
arXiv Detail & Related papers (2024-12-13T11:22:01Z) - Self-Supervised Learning of Grasping Arbitrary Objects On-the-Move [8.445514342786579]
This study introduces three fully convolutional neural network (FCN) models to predict static grasp primitive, dynamic grasp primitive, and residual moving velocity error from visual inputs.
The proposed method achieved the highest grasping accuracy and pick-and-place efficiency.
arXiv Detail & Related papers (2024-11-15T02:59:16Z) - Cooperative and Asynchronous Transformer-based Mission Planning for Heterogeneous Teams of Mobile Robots [1.1049608786515839]
We propose a Cooperative and Asynchronous Transformer-based Mission Planning (CATMiP) framework to coordinate distributed decision making among agents.
We evaluate CATMiP in a 2D grid-world simulation environment and compare its performance against planning-based exploration methods.
arXiv Detail & Related papers (2024-10-08T21:14:09Z) - CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications [59.193626019860226]
Vision Transformers (ViTs) mark a revolutionary advance in neural networks with their token mixer's powerful global context capability.
We introduce CAS-ViT: Convolutional Additive Self-attention Vision Transformers.
We show that CAS-ViT achieves a competitive performance when compared to other state-of-the-art backbones.
arXiv Detail & Related papers (2024-08-07T11:33:46Z) - Nonprehensile Riemannian Motion Predictive Control [57.295751294224765]
We introduce a novel Real-to-Sim reward analysis technique to reliably imagine and predict the outcome of taking possible actions for a real robotic platform.
We produce a closed-loop controller to reactively push objects in a continuous action space.
We observe that RMPC is robust in cluttered as well as occluded environments and outperforms the baselines.
arXiv Detail & Related papers (2021-11-15T18:50:04Z) - Efficient Task Planning for Mobile Manipulation: a Virtual Kinematic
Chain Perspective [88.25410628450453]
We present a Virtual Kinematic Chain perspective to improve task planning efficacy for mobile manipulation.
By consolidating the kinematics of the mobile base, the arm, and the object being manipulated collectively as a whole, this novel VKC perspective naturally defines abstract actions.
In experiments, we implement a task planner using Domain Planning Definition Language (PDDL) with VKC.
arXiv Detail & Related papers (2021-08-03T02:49:18Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - Articulated Object Interaction in Unknown Scenes with Whole-Body Mobile
Manipulation [16.79185733369416]
We propose a two-stage architecture for autonomous interaction with large articulated objects in unknown environments.
The first stage uses a learned model to estimate the articulated model of a target object from an RGB-D input and predicts an action-conditional sequence of states for interaction.
The second stage comprises of a whole-body motion controller to manipulate the object along the generated kinematic plan.
arXiv Detail & Related papers (2021-03-18T21:32:18Z) - Meta-Reinforcement Learning for Adaptive Motor Control in Changing Robot
Dynamics and Environments [3.5309638744466167]
This work developed a meta-learning approach that adapts the control policy on the fly to different changing conditions for robust locomotion.
The proposed method constantly updates the interaction model, samples feasible sequences of actions of estimated the state-action trajectories, and then applies the optimal actions to maximize the reward.
arXiv Detail & Related papers (2021-01-19T12:57:12Z) - Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill
Primitives [89.34229413345541]
We propose a conditioning scheme which avoids pitfalls by learning the controller and its conditioning in an end-to-end manner.
Our model predicts complex action sequences based directly on a dynamic image representation of the robot motion.
We report significant improvements in task success over representative MPC and IL baselines.
arXiv Detail & Related papers (2020-03-19T15:04:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.