Related papers: CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control

CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control

URL: http://arxiv.org/abs/2410.03441v1
Date: Fri, 4 Oct 2024 13:56:48 GMT
Title: CLoSD: Closing the Loop between Simulation and Diffusion for multi-task character control
Authors: Guy Tevet, Sigal Raab, Setareh Cohan, Daniele Reda, Zhengyi Luo, Xue Bin Peng, Amit H. Bermano, Michiel van de Panne,
Abstract summary: Motion diffusion models and Reinforcement Learning based control for physics-based simulations have complementary strengths for human motion generation. CLoSD is a text-driven RL physics-based controller, guided by diffusion generation for various tasks. CLoSD is capable of seamlessly performing a sequence of different tasks, including navigation to a goal location, striking an object with a hand or foot as specified in a text prompt, sitting down, and getting up.
Score: 27.418288450778192
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Motion diffusion models and Reinforcement Learning (RL) based control for physics-based simulations have complementary strengths for human motion generation. The former is capable of generating a wide variety of motions, adhering to intuitive control such as text, while the latter offers physically plausible motion and direct interaction with the environment. In this work, we present a method that combines their respective strengths. CLoSD is a text-driven RL physics-based controller, guided by diffusion generation for various tasks. Our key insight is that motion diffusion can serve as an on-the-fly universal planner for a robust RL controller. To this end, CLoSD maintains a closed-loop interaction between two modules -- a Diffusion Planner (DiP), and a tracking controller. DiP is a fast-responding autoregressive diffusion model, controlled by textual prompts and target locations, and the controller is a simple and robust motion imitator that continuously receives motion plans from DiP and provides feedback from the environment. CLoSD is capable of seamlessly performing a sequence of different tasks, including navigation to a goal location, striking an object with a hand or foot as specified in a text prompt, sitting down, and getting up. https://guytevet.github.io/CLoSD-page/

Related papers

C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation [81.4106601222722]
Trajectory-based motion control has emerged as an intuitive and efficient approach for controllable video generation. We propose a Chain-of-Thought-based motion controller for controllable video generation, named C-Drag. Our method includes an object perception module and a Chain-of-Thought-based motion reasoning module.
arXiv Detail & Related papers (2025-02-27T08:21:03Z)
DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control [12.465927271402442]
Text-conditioned human motion generation allows for user interaction through natural language. DartControl is a Diffusion-based Autoregressive motion primitive model for Real-time Text-driven motion control. Our model effectively learns a compact motion primitive space jointly conditioned on motion history and text inputs.
arXiv Detail & Related papers (2024-10-07T17:58:22Z)
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model [78.11258752076046]
MOFA-Video is an advanced controllable image animation method that generates video from the given image using various additional controllable signals. We design several domain-aware motion field adapters to control the generated motions in the video generation pipeline. After training, the MOFA-Adapters in different domains can also work together for more controllable video generation.
arXiv Detail & Related papers (2024-05-30T16:22:22Z)
Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control [106.32794844077534]
This paper presents a study on using deep reinforcement learning to create dynamic locomotion controllers for bipedal robots. We develop a general control solution that can be used for a range of dynamic bipedal skills, from periodic walking and running to aperiodic jumping and standing. This work pushes the limits of agility for bipedal robots through extensive real-world experiments.
arXiv Detail & Related papers (2024-01-30T10:48:43Z)
A comparison of RL-based and PID controllers for 6-DOF swimming robots: hybrid underwater object tracking [8.362739554991073]
We present an exploration and assessment of employing a centralized deep Q-network (DQN) controller as a substitute for PID controllers. Our primary focus centers on illustrating this transition with the specific case of underwater object tracking. Our experiments, conducted within a Unity-based simulator, validate the effectiveness of a centralized RL agent over separated PID controllers.
arXiv Detail & Related papers (2024-01-29T23:14:15Z)
TLControl: Trajectory and Language Control for Human Motion Synthesis [68.09806223962323]
We present TLControl, a novel method for realistic human motion synthesis. It incorporates both low-level Trajectory and high-level Language semantics controls. It is practical for interactive and high-quality animation generation.
arXiv Detail & Related papers (2023-11-28T18:54:16Z)
End-to-End Reinforcement Learning for Torque Based Variable Height Hopping [5.34772724436823]
Legged locomotion is arguably the most suited and versatile mode to deal with natural or unstructured terrains. In this paper, we present a end-to-end RL based torque controller that learns to implicitly detect the relevant jump phases. We also extend a method for simulation to reality transfer of the learned controller to contact rich dynamic tasks, resulting in successful deployment on the robot.
arXiv Detail & Related papers (2023-07-31T13:51:29Z)
MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators [108.67006263044772]
This paper presents a Motion General-Purpose generaTor (MotionGPT) that can use multimodal control signals. We first quantize multimodal control signals into discrete codes and then formulate them in a unified prompt instruction. Our MotionGPT demonstrates a unified human motion generation model with multimodal control signals by tuning a mere 0.4% of LLM parameters.
arXiv Detail & Related papers (2023-06-19T12:58:17Z)
Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion [83.88829943619656]
We introduce a method for generating realistic pedestrian trajectories and full-body animations that can be controlled to meet user-defined goals. Our guided diffusion model allows users to constrain trajectories through target waypoints, speed, and specified social groups. We propose utilizing the value function learned during RL training of the animation controller to guide diffusion to produce trajectories better suited for particular scenarios.
arXiv Detail & Related papers (2023-04-04T15:46:42Z)
Temporally Layered Architecture for Adaptive, Distributed and Continuous Control [2.1700103865910503]
We present temporally layered architecture (TLA), a biologically inspired system for temporally adaptive distributed control. TLA layers a fast and a slow controller together to achieve temporal abstraction that allows each layer to focus on a different time-scale. Our design is biologically inspired and draws on the architecture of the human brain which executes actions at different timescales depending on the environment's demands.
arXiv Detail & Related papers (2022-12-25T08:46:22Z)
Nonprehensile Riemannian Motion Predictive Control [57.295751294224765]
We introduce a novel Real-to-Sim reward analysis technique to reliably imagine and predict the outcome of taking possible actions for a real robotic platform. We produce a closed-loop controller to reactively push objects in a continuous action space. We observe that RMPC is robust in cluttered as well as occluded environments and outperforms the baselines.
arXiv Detail & Related papers (2021-11-15T18:50:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.