A Geometric Perspective on Visual Imitation Learning
- URL: http://arxiv.org/abs/2003.02768v1
- Date: Thu, 5 Mar 2020 16:57:54 GMT
- Title: A Geometric Perspective on Visual Imitation Learning
- Authors: Jun Jin, Laura Petrich, Masood Dehghan and Martin Jagersand
- Abstract summary: We consider the problem of visual imitation learning without human supervision.
We propose VGS-IL (Visual Geometric Skill Learning), which infers globally consistent geometric feature association rules from human video frames.
- Score: 8.904045267033258
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of visual imitation learning without human
supervision (e.g. kinesthetic teaching or teleoperation), nor access to an
interactive reinforcement learning (RL) training environment. We present a
geometric perspective to derive solutions to this problem. Specifically, we
propose VGS-IL (Visual Geometric Skill Imitation Learning), an end-to-end
geometry-parameterized task concept inference method, to infer globally
consistent geometric feature association rules from human demonstration video
frames. We show that, instead of learning actions from image pixels, learning a
geometry-parameterized task concept provides an explainable and invariant
representation across demonstrator to imitator under various environmental
settings. Moreover, such a task concept representation provides a direct link
with geometric vision based controllers (e.g. visual servoing), allowing for
efficient mapping of high-level task concepts to low-level robot actions.
Related papers
- G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images [45.66479596827045]
We propose a Geometry-enhanced NeRF (G-NeRF), which seeks to enhance the geometry priors by a geometry-guided multi-view synthesis approach.
To tackle the absence of multi-view supervision for single-view images, we design the depth-aware training approach.
arXiv Detail & Related papers (2024-04-11T04:58:18Z) - A Closed-Loop Multi-perspective Visual Servoing Approach with
Reinforcement Learning [9.152067359388207]
We presented a novel learning-based multi-perspective visual servoing framework.
We showed that our method can successfully learn an optimal control policy given initial images from different perspectives.
arXiv Detail & Related papers (2023-12-25T20:46:36Z) - Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches.
We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment.
Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z) - Instance-Agnostic Geometry and Contact Dynamics Learning [7.10598685240178]
This work presents an instance-agnostic learning framework that fuses vision with dynamics to simultaneously learn shape, pose trajectories, and physical properties via the use of geometry as a shared representation.
We integrate a vision system, BundleSDF, with a dynamics system, ContactNets, and propose a cyclic training pipeline to use the output from the dynamics module to refine the poses and the geometry from the vision module.
Experiments demonstrate our framework's ability to learn the geometry and dynamics of rigid and convex objects and improve upon the current tracking framework.
arXiv Detail & Related papers (2023-09-11T21:18:15Z) - Transformer-based model for monocular visual odometry: a video
understanding approach [0.9790236766474201]
We deal with the monocular visual odometry as a video understanding task to estimate the 6-F camera's pose.
We contribute by presenting the TS-DoVO model based on on-temporal self-attention mechanisms to extract features from clips and estimate the motions in an end-to-end manner.
Our approach achieved competitive state-of-the-art performance compared with geometry-based and deep learning-based methods on the KITTI visual odometry dataset.
arXiv Detail & Related papers (2023-05-10T13:11:23Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - Self-Supervised Image Representation Learning with Geometric Set
Consistency [50.12720780102395]
We propose a method for self-supervised image representation learning under the guidance of 3D geometric consistency.
Specifically, we introduce 3D geometric consistency into a contrastive learning framework to enforce the feature consistency within image views.
arXiv Detail & Related papers (2022-03-29T08:57:33Z) - Generalizable task representation learning from human demonstration
videos: a geometric approach [4.640835690336654]
We study the problem of generalizable task learning from human demonstration videos without extra training on the robot or pre-recorded robot motions.
We propose CoVGS-IL, which uses a graphstructured task function to learn task representations under structural constraints.
arXiv Detail & Related papers (2022-02-28T08:25:57Z) - Nothing But Geometric Constraints: A Model-Free Method for Articulated
Object Pose Estimation [89.82169646672872]
We propose an unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori.
We combine a classical geometric formulation with deep learning and extend the use of epipolar multi-rigid-body constraints to solve this task.
arXiv Detail & Related papers (2020-11-30T20:46:48Z) - PackIt: A Virtual Environment for Geometric Planning [68.79816936618454]
PackIt is a virtual environment to evaluate and potentially learn the ability to do geometric planning.
We construct a set of challenging packing tasks using an evolutionary algorithm.
arXiv Detail & Related papers (2020-07-21T22:51:17Z) - Neural Topological SLAM for Visual Navigation [112.73876869904]
We design topological representations for space that leverage semantics and afford approximate geometric reasoning.
We describe supervised learning-based algorithms that can build, maintain and use such representations under noisy actuation.
arXiv Detail & Related papers (2020-05-25T17:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.