Highway Value Iteration Networks
- URL: http://arxiv.org/abs/2406.03485v1
- Date: Wed, 5 Jun 2024 17:46:26 GMT
- Title: Highway Value Iteration Networks
- Authors: Yuhui Wang, Weida Li, Francesco Faccio, Qingyuan Wu, Jürgen Schmidhuber,
- Abstract summary: We introduce highway value iteration into the structure of value iteration networks (VINs)
The resulting novel highway VIN can be trained effectively with hundreds of layers using standard backpropagation.
In long-term planning tasks requiring hundreds of planning steps, deep highway VINs outperform both traditional VINs and several advanced, very deep NNs.
- Score: 28.812226679935108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Value iteration networks (VINs) enable end-to-end learning for planning tasks by employing a differentiable "planning module" that approximates the value iteration algorithm. However, long-term planning remains a challenge because training very deep VINs is difficult. To address this problem, we embed highway value iteration -- a recent algorithm designed to facilitate long-term credit assignment -- into the structure of VINs. This improvement augments the "planning module" of the VIN with three additional components: 1) an "aggregate gate," which constructs skip connections to improve information flow across many layers; 2) an "exploration module," crafted to increase the diversity of information and gradient flow in spatial dimensions; 3) a "filter gate" designed to ensure safe exploration. The resulting novel highway VIN can be trained effectively with hundreds of layers using standard backpropagation. In long-term planning tasks requiring hundreds of planning steps, deep highway VINs outperform both traditional VINs and several advanced, very deep NNs.
Related papers
- DELTAv2: Accelerating Dense 3D Tracking [79.63990337419514]
We propose a novel algorithm for accelerating dense long-term 3D point tracking in videos.<n>We introduce a coarse-to-fine strategy that begins tracking with a small subset of points and progressively expands the set of tracked trajectories.<n>The newly added trajectories are using a learnable module, which is trained end-to-end alongside the tracking network.
arXiv Detail & Related papers (2025-08-02T03:15:47Z) - Enhancing UAV Path Planning Efficiency Through Accelerated Learning [3.216130900831975]
This study aims to develop a learning algorithm for the path planning of UAV wireless communication relays.
It can reduce storage requirements and accelerate Deep Reinforcement Learning (DRL) convergence.
arXiv Detail & Related papers (2025-01-17T12:05:24Z) - Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method [94.74003109176581]
Long-Horizon Vision-Language Navigation (LH-VLN) is a novel VLN task that emphasizes long-term planning and decision consistency across consecutive subtasks.
Our platform, benchmark and method supply LH-VLN with a robust data generation pipeline, comprehensive model evaluation dataset, reasonable metrics, and a novel VLN model.
arXiv Detail & Related papers (2024-12-12T09:08:13Z) - SCoTT: Wireless-Aware Path Planning with Vision Language Models and Strategic Chains-of-Thought [78.53885607559958]
A novel approach using vision language models (VLMs) is proposed for enabling path planning in complex wireless-aware environments.
To this end, insights from a digital twin with real-world wireless ray tracing data are explored.
Results show that SCoTT achieves very close average path gains compared to DP-WA* while at the same time yielding consistently shorter path lengths.
arXiv Detail & Related papers (2024-11-27T10:45:49Z) - Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning [29.545549033285987]
The Value Iteration Network (VIN) is an end-to-end differentiable architecture that performs value iteration on a latent MDP for planning in reinforcement learning (RL)
VINs struggle to scale to long-term and large-scale planning tasks, such as navigating a $100times 100$ maze.
We address this deficiency by augmenting the latent MDP with a dynamic transition kernel.
We find that our new method, named Dynamic Transition VIN (DT-VIN), easily scales to 5000 layers and casually solves challenging versions of the above tasks.
arXiv Detail & Related papers (2024-06-12T16:52:54Z) - DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems [26.48767051423456]
We present a novel attention-based Partition-and-Navigation encoder (P&N) that learns distinct embeddings for partition and navigation.
We develop an effective agent-permutation-symmetric (APS) loss function.
arXiv Detail & Related papers (2024-05-27T15:33:16Z) - JPerceiver: Joint Perception Network for Depth, Pose and Layout
Estimation in Driving Scenes [75.20435924081585]
JPerceiver can simultaneously estimate scale-aware depth and VO as well as BEV layout from a monocular video sequence.
It exploits the cross-view geometric transformation (CGT) to propagate the absolute scale from the road layout to depth and VO.
Experiments on Argoverse, Nuscenes and KITTI show the superiority of JPerceiver over existing methods on all the above three tasks.
arXiv Detail & Related papers (2022-07-16T10:33:59Z) - Deep Reinforcement Learning Aided Packet-Routing For Aeronautical Ad-Hoc
Networks Formed by Passenger Planes [99.54065757867554]
We invoke deep reinforcement learning for routing in AANETs aiming at minimizing the end-to-end (E2E) delay.
A deep Q-network (DQN) is conceived for capturing the relationship between the optimal routing decision and the local geographic information observed by the forwarding node.
We further exploit the knowledge concerning the system's dynamics by using a deep value network (DVN) conceived with a feedback mechanism.
arXiv Detail & Related papers (2021-10-28T14:18:56Z) - Trajectory Design for UAV-Based Internet-of-Things Data Collection: A
Deep Reinforcement Learning Approach [93.67588414950656]
In this paper, we investigate an unmanned aerial vehicle (UAV)-assisted Internet-of-Things (IoT) system in a 3D environment.
We present a TD3-based trajectory design for completion time minimization (TD3-TDCTM) algorithm.
Our simulation results show the superiority of the proposed TD3-TDCTM algorithm over three conventional non-learning based baseline methods.
arXiv Detail & Related papers (2021-07-23T03:33:29Z) - Jamming-Resilient Path Planning for Multiple UAVs via Deep Reinforcement
Learning [1.2330326247154968]
Unmanned aerial vehicles (UAVs) are expected to be an integral part of wireless networks.
In this paper, we aim to find collision-free paths for multiple cellular-connected UAVs.
We propose an offline temporal difference (TD) learning algorithm with online signal-to-interference-plus-noise ratio mapping to solve the problem.
arXiv Detail & Related papers (2021-04-09T16:52:33Z) - Structured Scene Memory for Vision-Language Navigation [155.63025602722712]
We propose a crucial architecture for vision-language navigation (VLN)
It is compartmentalized enough to accurately memorize the percepts during navigation.
It also serves as a structured scene representation, which captures and disentangles visual and geometric cues in the environment.
arXiv Detail & Related papers (2021-03-05T03:41:00Z) - Multi-Agent Reinforcement Learning in NOMA-aided UAV Networks for
Cellular Offloading [59.32570888309133]
A novel framework is proposed for cellular offloading with the aid of multiple unmanned aerial vehicles (UAVs)
Non-orthogonal multiple access (NOMA) technique is employed at each UAV to further improve the spectrum efficiency of the wireless network.
A mutual deep Q-network (MDQN) algorithm is proposed to jointly determine the optimal 3D trajectory and power allocation of UAVs.
arXiv Detail & Related papers (2020-10-18T20:22:05Z) - NOMA in UAV-aided cellular offloading: A machine learning approach [59.32570888309133]
A novel framework is proposed for cellular offloading with the aid of multiple unmanned aerial vehicles (UAVs)
Non-orthogonal multiple access (NOMA) technique is employed at each UAV to further improve the spectrum efficiency of the wireless network.
A mutual deep Q-network (MDQN) algorithm is proposed to jointly determine the optimal 3D trajectory and power allocation of UAVs.
arXiv Detail & Related papers (2020-10-18T17:38:48Z) - Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for
DNN Workloads [11.646744408920764]
Auto-MAP is a framework for exploring distributed execution plans for workloads.
It can automatically discovering fast parallelization strategies through reinforcement learning on IR level of deep learning models.
Our evaluation shows that Auto-MAP can find the optimal solution in two hours, while achieving better throughput on several NLP and convolution models.
arXiv Detail & Related papers (2020-07-08T12:38:03Z) - Using Deep Reinforcement Learning Methods for Autonomous Vessels in 2D
Environments [11.657524999491029]
In this work, we used deep reinforcement learning combining Q-learning with a neural representation to avoid instability.
Our methodology uses deep q-learning and combines it with a rolling wave planning approach on agile methodology.
Experimental results show that the proposed method enhanced the performance of VVN by 55.31 on average for long-distance missions.
arXiv Detail & Related papers (2020-03-23T12:58:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.