Future Frame Prediction for Robot-assisted Surgery
- URL: http://arxiv.org/abs/2103.10308v1
- Date: Thu, 18 Mar 2021 15:12:06 GMT
- Title: Future Frame Prediction for Robot-assisted Surgery
- Authors: Xiaojie Gao, Yueming Jin, Zixu Zhao, Qi Dou, Pheng-Ann Heng
- Abstract summary: We propose a ternary prior guided variational autoencoder (TPG-VAE) model for future frame prediction in robotic surgical video sequences.
Besides content distribution, our model learns motion distribution, which is novel to handle the small movements of surgical tools.
- Score: 57.18185972461453
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting future frames for robotic surgical video is an interesting,
important yet extremely challenging problem, given that the operative tasks may
have complex dynamics. Existing approaches on future prediction of natural
videos were based on either deterministic models or stochastic models,
including deep recurrent neural networks, optical flow, and latent space
modeling. However, the potential in predicting meaningful movements of robots
with dual arms in surgical scenarios has not been tapped so far, which is
typically more challenging than forecasting independent motions of one arm
robots in natural scenarios. In this paper, we propose a ternary prior guided
variational autoencoder (TPG-VAE) model for future frame prediction in robotic
surgical video sequences. Besides content distribution, our model learns motion
distribution, which is novel to handle the small movements of surgical tools.
Furthermore, we add the invariant prior information from the gesture class into
the generation process to constrain the latent space of our model. To our best
knowledge, this is the first time that the future frames of dual arm robots are
predicted considering their unique characteristics relative to general robotic
videos. Experiments demonstrate that our model gains more stable and realistic
future frame prediction scenes with the suturing task on the public JIGSAWS
dataset.
Related papers
- Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation [65.46610405509338]
We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot manipulation.
Our framework,Track2Act predicts tracks of how points in an image should move in future time-steps based on a goal.
We show that this approach of combining scalably learned track prediction with a residual policy enables diverse generalizable robot manipulation.
arXiv Detail & Related papers (2024-05-02T17:56:55Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Learning an Actionable Discrete Diffusion Policy via Large-Scale Actionless Video Pre-Training [69.54948297520612]
Learning a generalist embodied agent poses challenges, primarily stemming from the scarcity of action-labeled robotic datasets.
We introduce a novel framework to tackle these challenges, which leverages a unified discrete diffusion to combine generative pre-training on human videos and policy fine-tuning on a small number of action-labeled robot videos.
Our method generates high-fidelity future videos for planning and enhances the fine-tuned policies compared to previous state-of-the-art approaches.
arXiv Detail & Related papers (2024-02-22T09:48:47Z) - Action-conditioned Deep Visual Prediction with RoAM, a new Indoor Human
Motion Dataset for Autonomous Robots [1.7778609937758327]
We introduce the Robot Autonomous Motion (RoAM) video dataset.
It is collected with a custom-made turtlebot3 Burger robot in a variety of indoor environments recording various human motions from the robot's ego-vision.
The dataset also includes synchronized records of the LiDAR scan and all control actions taken by the robot as it navigates around static and moving human agents.
arXiv Detail & Related papers (2023-06-28T00:58:44Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - SPOTR: Spatio-temporal Pose Transformers for Human Motion Prediction [12.248428883804763]
3D human motion prediction is a research area computation of high significance and a challenge in computer vision.
Traditionally, autogregressive models have been used to predict human motion.
We present a non-autoregressive model for human motion prediction.
arXiv Detail & Related papers (2023-03-11T01:44:29Z) - STPOTR: Simultaneous Human Trajectory and Pose Prediction Using a
Non-Autoregressive Transformer for Robot Following Ahead [8.227864212055035]
We develop a neural network model to predict future human motion from an observed human motion history.
We propose a non-autoregressive transformer architecture to leverage its parallel nature for easier training and fast, accurate predictions at test time.
Our model is well-suited for robotic applications in terms of test accuracy and speed favorably with respect to state-of-the-art methods.
arXiv Detail & Related papers (2022-09-15T20:27:54Z) - Full-Body Visual Self-Modeling of Robot Morphologies [29.76701883250049]
Internal computational models of physical bodies are fundamental to the ability of robots and animals alike to plan and control their actions.
Recent progress in fully data-driven self-modeling has enabled machines to learn their own forward kinematics directly from task-agnostic interaction data.
Here, we propose that instead of directly modeling forward-kinematics, a more useful form of self-modeling is one that could answer space occupancy queries.
arXiv Detail & Related papers (2021-11-11T18:58:07Z) - Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works.
However, learning a model that captures the dynamics of complex skills represents a major challenge.
We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.