MVP: Unified Motion and Visual Self-Supervised Learning for Large-Scale
Robotic Navigation
- URL: http://arxiv.org/abs/2003.00667v1
- Date: Mon, 2 Mar 2020 05:19:52 GMT
- Title: MVP: Unified Motion and Visual Self-Supervised Learning for Large-Scale
Robotic Navigation
- Authors: Marvin Chanc\'an, Michael Milford
- Abstract summary: We propose a novel motion and visual perception approach, dubbed MVP, for large-scale, target-driven navigation tasks.
Our MVP-based method can learn faster, and is more accurate and robust to both extreme environmental changes and poor GPS data.
We evaluate our method on two large real-world datasets, Oxford Robotcar and Nordland Railway.
- Score: 23.54696982881734
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Autonomous navigation emerges from both motion and local visual perception in
real-world environments. However, most successful robotic motion estimation
methods (e.g. VO, SLAM, SfM) and vision systems (e.g. CNN, visual place
recognition-VPR) are often separately used for mapping and localization tasks.
Conversely, recent reinforcement learning (RL) based methods for visual
navigation rely on the quality of GPS data reception, which may not be reliable
when directly using it as ground truth across multiple, month-spaced traversals
in large environments. In this paper, we propose a novel motion and visual
perception approach, dubbed MVP, that unifies these two sensor modalities for
large-scale, target-driven navigation tasks. Our MVP-based method can learn
faster, and is more accurate and robust to both extreme environmental changes
and poor GPS data than corresponding vision-only navigation methods. MVP
temporally incorporates compact image representations, obtained using VPR, with
optimized motion estimation data, including but not limited to those from VO or
optimized radar odometry (RO), to efficiently learn self-supervised navigation
policies via RL. We evaluate our method on two large real-world datasets,
Oxford Robotcar and Nordland Railway, over a range of weather (e.g. overcast,
night, snow, sun, rain, clouds) and seasonal (e.g. winter, spring, fall,
summer) conditions using the new CityLearn framework; an interactive
environment for efficiently training navigation agents. Our experimental
results, on traversals of the Oxford RobotCar dataset with no GPS data, show
that MVP can achieve 53% and 93% navigation success rate using VO and RO,
respectively, compared to 7% for a vision-only method. We additionally report a
trade-off between the RL success rate and the motion estimation precision.
Related papers
- MPVO: Motion-Prior based Visual Odometry for PointGoal Navigation [3.9974562667271507]
Visual odometry (VO) is essential for enabling accurate point-goal navigation of embodied agents in indoor environments.
Recent deep-learned VO methods show robust performance but suffer from sample inefficiency during training.
We propose a robust and sample-efficient VO pipeline based on motion priors available while an agent is navigating an environment.
arXiv Detail & Related papers (2024-11-07T15:36:49Z) - More Than Routing: Joint GPS and Route Modeling for Refine Trajectory
Representation Learning [26.630640299709114]
We propose Joint GPS and Route Modelling based on self-supervised technology, namely JGRM.
We develop two encoders, each tailored to capture representations of route and GPS trajectories respectively.
The representations from the two modalities are fed into a shared transformer for inter-modal information interaction.
arXiv Detail & Related papers (2024-02-25T18:27:25Z) - Scaling Data Generation in Vision-and-Language Navigation [116.95534559103788]
We propose an effective paradigm for generating large-scale data for learning.
We apply 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs.
Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning.
arXiv Detail & Related papers (2023-07-28T16:03:28Z) - Learning Navigational Visual Representations with Semantic Map
Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps.
Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z) - AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal
Reasoning [63.628195002143734]
We propose a novel approach for aerial video action recognition.
Our method is designed for videos captured using UAVs and can run on edge or mobile devices.
We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately.
arXiv Detail & Related papers (2023-03-02T21:24:19Z) - Unsupervised Visual Odometry and Action Integration for PointGoal
Navigation in Indoor Environment [14.363948775085534]
PointGoal navigation in indoor environment is a fundamental task for personal robots to navigate to a specified point.
To improve the PointGoal navigation accuracy without GPS signal, we use visual odometry (VO) and propose a novel action integration module (AIM) trained in unsupervised manner.
Experiments show that the proposed system achieves satisfactory results and outperforms the partially supervised learning algorithms on the popular Gibson dataset.
arXiv Detail & Related papers (2022-10-02T03:12:03Z) - ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints [94.60414567852536]
Long-range navigation requires both planning and reasoning about local traversability.
We propose a learning-based approach that integrates learning and planning.
ViKiNG can leverage its image-based learned controller and goal-directed to navigate to goals up to 3 kilometers away.
arXiv Detail & Related papers (2022-02-23T02:14:23Z) - Learning Perceptual Locomotion on Uneven Terrains using Sparse Visual
Observations [75.60524561611008]
This work aims to exploit the use of sparse visual observations to achieve perceptual locomotion over a range of commonly seen bumps, ramps, and stairs in human-centred environments.
We first formulate the selection of minimal visual input that can represent the uneven surfaces of interest, and propose a learning framework that integrates such exteroceptive and proprioceptive data.
We validate the learned policy in tasks that require omnidirectional walking over flat ground and forward locomotion over terrains with obstacles, showing a high success rate.
arXiv Detail & Related papers (2021-09-28T20:25:10Z) - The Surprising Effectiveness of Visual Odometry Techniques for Embodied
PointGoal Navigation [100.08270721713149]
PointGoal navigation has been introduced in simulated Embodied AI environments.
Recent advances solve this PointGoal navigation task with near-perfect accuracy (99.6% success)
We show that integrating visual odometry techniques into navigation policies improves the state-of-the-art on the popular Habitat PointNav benchmark by a large margin.
arXiv Detail & Related papers (2021-08-26T02:12:49Z) - Robot Perception enables Complex Navigation Behavior via Self-Supervised
Learning [23.54696982881734]
We propose an approach to unify successful robot perception systems for active target-driven navigation tasks via reinforcement learning (RL)
Our method temporally incorporates compact motion and visual perception data, directly obtained using self-supervision from a single image sequence.
We demonstrate our approach on two real-world driving dataset, KITTI and Oxford RobotCar, using the new interactive CityLearn framework.
arXiv Detail & Related papers (2020-06-16T07:45:47Z) - SDVTracker: Real-Time Multi-Sensor Association and Tracking for
Self-Driving Vehicles [11.317136648551537]
We present a practical and lightweight tracking system, SDVTracker, that uses a deep learned model for association and state estimation.
We show this system significantly outperforms hand-engineered methods on a real-world urban driving dataset while running in less than 2.5 ms on CPU for a scene with 100 actors.
arXiv Detail & Related papers (2020-03-09T23:07:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.