Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment
- URL: http://arxiv.org/abs/2103.04075v1
- Date: Sat, 6 Mar 2021 09:10:03 GMT
- Title: Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment
- Authors: Xueying Shi, Yueming Jin, Qi Dou, Jing Qin, and Pheng-Ann Heng
- Abstract summary: We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
- Score: 60.31418655784291
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated surgical gesture recognition is of great importance in
robot-assisted minimally invasive surgery. However, existing methods assume
that training and testing data are from the same domain, which suffers from
severe performance degradation when a domain gap exists, such as the simulator
and real robot. In this paper, we propose a novel unsupervised domain
adaptation framework which can simultaneously transfer multi-modality
knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using
temporal cues in videos, and inherent correlations in multi-modal towards
recognizing gesture. Specifically, we first propose an MDO-K to align
kinematics, which exploits temporal continuity to transfer motion directions
with smaller gap rather than position values, relieving the adaptation burden.
Moreover, we propose a KV-Relation-ATT to transfer the co-occurrence signals of
kinematics and vision. Such features attended by correlation similarity are
more informative for enhancing domain-invariance of the model. Two feature
alignment strategies benefit the model mutually during the end-to-end learning
process. We extensively evaluate our method for gesture recognition using DESK
dataset with peg transfer procedure. Results show that our approach recovers
the performance with great improvement gains, up to 12.91% in ACC and 20.16% in
F1score without using any annotations in real robot.
Related papers
- Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs Towards
Robot-assisted Intubation [15.795665057836636]
This work introduces a virtual dataset generated by the Open Framework Architecture framework to overcome the limited availability of actual endoscopic images.
We also propose a domain adaptive Sim-to-Real method for oropharyngeal organ image segmentation, which employs an image blending strategy.
Experimental results demonstrate the superior performance of the proposed approach with domain adaptive models.
arXiv Detail & Related papers (2023-05-19T14:08:15Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - CaRTS: Causality-driven Robot Tool Segmentation from Vision and
Kinematics Data [11.92904350972493]
Vision-based segmentation of the robotic tool during robot-assisted surgery enables downstream applications, such as augmented reality feedback.
With the introduction of deep learning, many methods were presented to solve instrument segmentation directly and solely from images.
We present CaRTS, a causality-driven robot tool segmentation algorithm, that is designed based on a complementary causal model of the robot tool segmentation task.
arXiv Detail & Related papers (2022-03-15T22:26:19Z) - ProFormer: Learning Data-efficient Representations of Body Movement with
Prototype-based Feature Augmentation and Visual Transformers [31.908276711898548]
Methods for data-efficient recognition from body poses increasingly leverage skeleton sequences structured as image-like arrays.
We look at this paradigm from the perspective of transformer networks, for the first time exploring visual transformers as data-efficient encoders of skeleton movement.
In our pipeline, body pose sequences cast as image-like representations are converted into patch embeddings and then passed to a visual transformer backbone optimized with deep metric learning.
arXiv Detail & Related papers (2022-02-23T11:11:54Z) - Efficient Global-Local Memory for Real-time Instrument Segmentation of
Robotic Surgical Video [53.14186293442669]
We identify two important clues for surgical instrument perception, including local temporal dependency from adjacent frames and global semantic correlation in long-range duration.
We propose a novel dual-memory network (DMNet) to relate both global and local-temporal knowledge.
Our method largely outperforms the state-of-the-art works on segmentation accuracy while maintaining a real-time speed.
arXiv Detail & Related papers (2021-09-28T10:10:14Z) - Attention-based Adversarial Appearance Learning of Augmented Pedestrians [49.25430012369125]
We propose a method to synthesize realistic data for the pedestrian recognition task.
Our approach utilizes an attention mechanism driven by an adversarial loss to learn domain discrepancies.
Our experiments confirm that the proposed adaptation method is robust to such discrepancies and reveals both visual realism and semantic consistency.
arXiv Detail & Related papers (2021-07-06T15:27:00Z) - One to Many: Adaptive Instrument Segmentation via Meta Learning and
Dynamic Online Adaptation in Robotic Surgical Video [71.43912903508765]
MDAL is a dynamic online adaptive learning scheme for instrument segmentation in robot-assisted surgery.
It learns the general knowledge of instruments and the fast adaptation ability through the video-specific meta-learning paradigm.
It outperforms other state-of-the-art methods on two datasets.
arXiv Detail & Related papers (2021-03-24T05:02:18Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.