Related papers: Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning

Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning

URL: http://arxiv.org/abs/2406.08404v2
Date: Sun, 06 Jul 2025 07:48:39 GMT
Title: Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning
Authors: Yuhui Wang, Qingyuan Wu, Dylan R. Ashley, Francesco Faccio, Weida Li, Chao Huang, Jürgen Schmidhuber,
Abstract summary: Value Iteration Network (VIN) is an end-to-end differentiable neural network architecture for planning.<n>VINs struggle to scale to long-term and large-scale planning tasks, such as navigating a 100x100 maze.<n>We introduce Dynamic Transition VIN (DT-VIN), which scales to 5000 layers and solves challenging versions of the above tasks.
Score: 29.545549033285987
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Value Iteration Network (VIN) is an end-to-end differentiable neural network architecture for planning. It exhibits strong generalization to unseen domains by incorporating a differentiable planning module that operates on a latent Markov Decision Process (MDP). However, VINs struggle to scale to long-term and large-scale planning tasks, such as navigating a 100x100 maze -- a task that typically requires thousands of planning steps to solve. We observe that this deficiency is due to two issues: the representation capacity of the latent MDP and the planning module's depth. We address these by augmenting the latent MDP with a dynamic transition kernel, dramatically improving its representational capacity, and, to mitigate the vanishing gradient problem, introduce an "adaptive highway loss" that constructs skip connections to improve gradient flow. We evaluate our method on 2D/3D maze navigation environments, continuous control, and the real-world Lunar rover navigation task. We find that our new method, named Dynamic Transition VIN (DT-VIN), scales to 5000 layers and solves challenging versions of the above tasks. Altogether, we believe that DT-VIN represents a concrete step forward in performing long-term large-scale planning in complex environments.

Related papers

Efficient Generative Transformer Operators For Million-Point PDEs [12.324265832276538]
ECHO is a transformer-operator framework for generating million-point PDE trajectories.<n>We demonstrate state-of-the-art performance on million-point simulations featuring complex, high-frequency dynamics, and long-term horizons.
arXiv Detail & Related papers (2025-12-04T16:46:48Z)
TP-MDDN: Task-Preferenced Multi-Demand-Driven Navigation with Autonomous Decision-Making [90.18833928208333]
Task-Preferenced Multi-Demand-Driven Navigation (TP-MDDN) is a new benchmark for long-horizon navigation involving multiple sub-demands with explicit task preferences.<n>For spatial memory, we design MASMap, which combines 3D point cloud accumulation with 2D semantic mapping for accurate and efficient environmental understanding.<n>Our approach outperforms state-of-the-art baselines in both perception accuracy and navigation robustness.
arXiv Detail & Related papers (2025-11-21T13:12:13Z)
Extendable Long-Horizon Planning via Hierarchical Multiscale Diffusion [62.91968752955649]
This paper tackles a novel problem, extendable long-horizon planning-enabling agents to plan trajectories longer than those in training data without compounding errors. We propose an augmentation method that iteratively generates longer trajectories by stitching shorter ones. HM-Diffuser trains on these extended trajectories using a hierarchical structure, efficiently handling tasks across multiple temporal scales.
arXiv Detail & Related papers (2025-03-25T22:52:46Z)
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method [94.74003109176581]
Long-Horizon Vision-Language Navigation (LH-VLN) is a novel VLN task that emphasizes long-term planning and decision consistency across consecutive subtasks. Our platform, benchmark and method supply LH-VLN with a robust data generation pipeline, comprehensive model evaluation dataset, reasonable metrics, and a novel VLN model.
arXiv Detail & Related papers (2024-12-12T09:08:13Z)
SCoTT: Wireless-Aware Path Planning with Vision Language Models and Strategic Chains-of-Thought [78.53885607559958]
A novel approach using vision language models (VLMs) is proposed for enabling path planning in complex wireless-aware environments. To this end, insights from a digital twin with real-world wireless ray tracing data are explored. Results show that SCoTT achieves very close average path gains compared to DP-WA* while at the same time yielding consistently shorter path lengths.
arXiv Detail & Related papers (2024-11-27T10:45:49Z)
DNN Task Assignment in UAV Networks: A Generative AI Enhanced Multi-Agent Reinforcement Learning Approach [16.139481340656552]
This paper presents a joint approach that combines multiple-agent reinforcement learning (MARL) and generative diffusion models (GDM) In the second stage, we introduce a novel DNN task assignment algorithm, termed GDM-MADDPG, which utilizes the reverse denoising process of GDM to replace the actor network in multi-agent deep deterministic policy gradient (MADDPG) Simulation results indicate that our algorithm performs favorably compared to benchmarks in terms of path planning, Age of Information (AoI), energy consumption, and task load balancing.
arXiv Detail & Related papers (2024-11-13T02:41:02Z)
ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks. We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation. Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z)
EPD: Long-term Memory Extraction, Context-awared Planning and Multi-iteration Decision @ EgoPlan Challenge ICML 2024 [50.89751993430737]
We introduce a novel planning framework which comprises three stages: long-term memory Extraction, context-awared Planning, and multi-iteration Decision, named EPD. EPD achieves a planning accuracy of 53.85% over 1,584 egocentric task planning questions.
arXiv Detail & Related papers (2024-07-28T15:14:07Z)
Highway Value Iteration Networks [28.812226679935108]
We introduce highway value iteration into the structure of value iteration networks (VINs) The resulting novel highway VIN can be trained effectively with hundreds of layers using standard backpropagation. In long-term planning tasks requiring hundreds of planning steps, deep highway VINs outperform both traditional VINs and several advanced, very deep NNs.
arXiv Detail & Related papers (2024-06-05T17:46:26Z)
Scaling Learning based Policy Optimization for Temporal Logic Tasks by Controller Network Dropout [4.421486904657393]
We introduce a model-based approach for training feedback controllers for an autonomous agent operating in a highly nonlinear environment. We show how this learning problem is similar to training recurrent neural networks (RNNs), where the number of recurrent units is proportional to the temporal horizon of the agent's task objectives. We introduce a novel gradient approximation algorithm based on the idea of dropout or gradient sampling.
arXiv Detail & Related papers (2024-03-23T12:53:51Z)
Module-wise Training of Neural Networks via the Minimizing Movement Scheme [15.315147138002153]
Greedy layer-wise or module-wise training of neural networks is compelling in constrained and on-device settings where memory is limited. We propose a module-wise regularization inspired by the minimizing movement scheme for gradient flows in distribution space. We show improved accuracy of module-wise training of various architectures such as ResNets, Transformers and VGG, when our regularization is added.
arXiv Detail & Related papers (2023-09-29T16:03:25Z)
Non-Separable Multi-Dimensional Network Flows for Visual Computing [62.50191141358778]
We propose a novel formalism for non-separable multi-dimensional network flows. Since the flow is defined on a per-dimension basis, the maximizing flow automatically chooses the best matching feature dimensions. As a proof of concept, we apply our formalism to the multi-object tracking problem and demonstrate that our approach outperforms scalar formulations on the MOT16 benchmark in terms of robustness to noise.
arXiv Detail & Related papers (2023-05-15T13:21:44Z)
Value Iteration Networks with Gated Summarization Module [7.289178621436725]
We address the challenges faced by Value Iteration Networks (VIN) in handling larger input maps and mitigating the impact of accumulated errors caused by increased iterations. We propose a novel approach, Value Iteration Networks with Gated Summarization Module (GS-VIN), which incorporates two main improvements.
arXiv Detail & Related papers (2023-05-11T12:25:12Z)
M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design [95.41238363769892]
Multi-task learning (MTL) encapsulates multiple learned tasks in a single model and often lets those tasks learn better jointly. Current MTL regimes have to activate nearly the entire model even to just execute a single task. We present a model-accelerator co-design framework to enable efficient on-device MTL.
arXiv Detail & Related papers (2022-10-26T15:40:24Z)
Dynamics-aware Adversarial Attack of 3D Sparse Convolution Network [75.1236305913734]
We investigate the dynamics-aware adversarial attack problem in deep neural networks. Most existing adversarial attack algorithms are designed under a basic assumption -- the network architecture is fixed throughout the attack process. We propose a Leaded Gradient Method (LGM) and show the significant effects of the lagged gradient.
arXiv Detail & Related papers (2021-12-17T10:53:35Z)
BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and Monocular Depth Estimation [60.34562823470874]
We propose a joint learning network of depth map super-resolution (DSR) and monocular depth estimation (MDE) without introducing additional supervision labels. One is the high-frequency attention bridge (HABdg) designed for the feature encoding process, which learns the high-frequency information of the MDE task to guide the DSR task. The other is the content guidance bridge (CGBdg) designed for the depth map reconstruction process, which provides the content guidance learned from DSR task for MDE task.
arXiv Detail & Related papers (2021-07-27T01:28:23Z)
Layer Pruning on Demand with Intermediate CTC [50.509073206630994]
We present a training and pruning method for ASR based on the connectionist temporal classification (CTC) We show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU.
arXiv Detail & Related papers (2021-06-17T02:40:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.