Related papers: Inside Knowledge: Graph-based Path Generation with Explainable Data Augmentation and Curriculum Learning for Visual Indoor Navigation

Inside Knowledge: Graph-based Path Generation with Explainable Data Augmentation and Curriculum Learning for Visual Indoor Navigation

URL: http://arxiv.org/abs/2508.11446v1
Date: Fri, 15 Aug 2025 12:54:13 GMT
Title: Inside Knowledge: Graph-based Path Generation with Explainable Data Augmentation and Curriculum Learning for Visual Indoor Navigation
Authors: Daniel Airinei, Elena Burceanu, Marius Leordeanu,
Abstract summary: We introduce an efficient, real-time and easily deployable deep learning approach that can predict the direction towards a target from images captured by a mobile device.<n>On the practical side, we introduce a novel largescale dataset, with video footage inside a relatively large shopping mall, in which each frame is annotated with the correct next direction towards different specific target destinations.<n>Ours relies solely on vision, avoiding the need of special sensors, additional markers placed along the path, knowledge of the scene map or internet access.
Score: 12.116725436513699
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Indoor navigation is a difficult task, as it generally comes with poor GPS access, forcing solutions to rely on other sources of information. While significant progress continues to be made in this area, deployment to production applications is still lacking, given the complexity and additional requirements of current solutions. Here, we introduce an efficient, real-time and easily deployable deep learning approach, based on visual input only, that can predict the direction towards a target from images captured by a mobile device. Our technical approach, based on a novel graph-based path generation method, combined with explainable data augmentation and curriculum learning, includes contributions that make the process of data collection, annotation and training, as automatic as possible, efficient and robust. On the practical side, we introduce a novel largescale dataset, with video footage inside a relatively large shopping mall, in which each frame is annotated with the correct next direction towards different specific target destinations. Different from current methods, ours relies solely on vision, avoiding the need of special sensors, additional markers placed along the path, knowledge of the scene map or internet access. We also created an easy to use application for Android, which we plan to make publicly available. We make all our data and code available along with visual demos on our project site

Related papers

From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance [10.533095161205358]
Cross-view image retrieval is critical for applications such as autonomous navigation, urban planning, and localization in GPS-denied environments.<n>We present a simple yet effective cross-view image retrieval framework that leverages a pretrained vision encoder and a large language model (LLM)<n>Despite using no ground-truth supervision or finetuning, our proposed method outperforms prior learning-based approaches on the benchmark dataset under zero-shot settings.
arXiv Detail & Related papers (2025-11-12T23:51:46Z)
VL-KnG: Visual Scene Understanding for Navigation Goal Identification using Spatiotemporal Knowledge Graphs [2.779512031764865]
We present Visual Scene Understanding system that tackles challenges using knowledge graph construction and efficient query processing for identification.<n>We also introduce WalkieKnowledge, a new benchmark with about 200 manually annotated questions across 8 diverse trajectories spanning approximately 100 minutes of video data.
arXiv Detail & Related papers (2025-10-01T21:53:44Z)
Aligning Knowledge Graph with Visual Perception for Object-goal Navigation [16.32780793344835]
We propose the Aligning Knowledge Graph with Visual Perception (AKGVP) method for object-goal navigation. Our approach introduces continuous modeling of the hierarchical scene architecture and leverages visual-language pre-training to align natural language description with visual perception. The integration of a continuous knowledge graph architecture and multimodal feature alignment empowers the navigator with a remarkable zero-shot navigation capability.
arXiv Detail & Related papers (2024-02-29T06:31:18Z)
CDFSL-V: Cross-Domain Few-Shot Learning for Videos [58.37446811360741]
Few-shot video action recognition is an effective approach to recognizing new categories with only a few labeled examples. Existing methods in video action recognition rely on large labeled datasets from the same domain. We propose a novel cross-domain few-shot video action recognition method that leverages self-supervised learning and curriculum learning.
arXiv Detail & Related papers (2023-09-07T19:44:27Z)
Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation. Our method significantly outperforms the state of the art on the challenging MP3D dataset. We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z)
SelfD: Self-Learning Large-Scale Driving Policies From the Web [13.879536370173506]
SelfD is a framework for learning scalable driving by utilizing large amounts of online monocular images. We employ a large dataset of publicly available YouTube videos to train SelfD and comprehensively analyze its generalization benefits across challenging navigation scenarios.
arXiv Detail & Related papers (2022-04-21T17:58:36Z)
ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints [94.60414567852536]
Long-range navigation requires both planning and reasoning about local traversability. We propose a learning-based approach that integrates learning and planning. ViKiNG can leverage its image-based learned controller and goal-directed to navigate to goals up to 3 kilometers away.
arXiv Detail & Related papers (2022-02-23T02:14:23Z)
Lifelong Topological Visual Navigation [16.41858724205884]
We propose a learning-based visual navigation method with graph update strategies that improve lifelong navigation performance over time. We take inspiration from sampling-based planning algorithms to build image-based topological graphs, resulting in sparser graphs yet with higher navigation performance compared to baseline methods. Unlike controllers that learn from fixed training environments, we show that our model can be finetuned using a relatively small dataset from the real-world environment where the robot is deployed.
arXiv Detail & Related papers (2021-10-16T06:16:14Z)
Rapid Exploration for Open-World Navigation with Latent Goal Models [78.45339342966196]
We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments. At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory of images. We use an information bottleneck to regularize the learned policy, giving us (i) a compact visual representation of goals, (ii) improved generalization capabilities, and (iii) a mechanism for sampling feasible goals for exploration.
arXiv Detail & Related papers (2021-04-12T23:14:41Z)
SOON: Scenario Oriented Object Navigation with Graph-based Exploration [102.74649829684617]
The ability to navigate like a human towards a language-guided target from anywhere in a 3D embodied environment is one of the 'holy grail' goals of intelligent robots. Most visual navigation benchmarks focus on navigating toward a target from a fixed starting point, guided by an elaborate set of instructions that depicts step-by-step. This approach deviates from real-world problems in which human-only describes what the object and its surrounding look like and asks the robot to start navigation from anywhere.
arXiv Detail & Related papers (2021-03-31T15:01:04Z)
Where2Act: From Pixels to Actions for Articulated 3D Objects [54.19638599501286]
We extract highly localized actionable information related to elementary actions such as pushing or pulling for articulated objects with movable parts. We propose a learning-from-interaction framework with an online data sampling strategy that allows us to train the network in simulation. Our learned models even transfer to real-world data.
arXiv Detail & Related papers (2021-01-07T18:56:38Z)
ViNG: Learning Open-World Navigation with Visual Goals [82.84193221280216]
We propose a learning-based navigation system for reaching visually indicated goals. We show that our system, which we call ViNG, outperforms previously-proposed methods for goal-conditioned reinforcement learning. We demonstrate ViNG on a number of real-world applications, such as last-mile delivery and warehouse inspection.
arXiv Detail & Related papers (2020-12-17T18:22:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.