Pointer Networks with Q-Learning for Combinatorial Optimization
- URL: http://arxiv.org/abs/2311.02629v4
- Date: Thu, 24 Oct 2024 17:25:19 GMT
- Title: Pointer Networks with Q-Learning for Combinatorial Optimization
- Authors: Alessandro Barro,
- Abstract summary: We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets)
Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
- Score: 55.2480439325792
- License:
- Abstract: We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets) to enhance the optimality of attention-based sequence generation, focusing on long-term outcomes. This integration proves particularly effective in solving combinatorial optimization (CO) tasks, especially the Travelling Salesman Problem (TSP), which is the focus of our study. We address this challenge by defining a Markov Decision Process (MDP) compatible with PQN, which involves iterative graph embedding, encoding and decoding by an LSTM-based recurrent neural network. This process generates a context vector and computes raw attention scores, which are dynamically adjusted by Q-values calculated for all available state-action pairs before applying softmax. The resulting attention vector is utilized as an action distribution, with actions selected hinged to exploration-exploitation dynamic adaptibility of PQN. Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
Related papers
- CVaR-Based Variational Quantum Optimization for User Association in Handoff-Aware Vehicular Networks [23.140655547353994]
We present a novel Conditional Value at Risk (CVaR)-based Variational Quantum Eigensolver (VQE) framework to address generalized assignment problems (GAP) in vehicular networks (VNets)
Our approach leverages a hybrid quantum-classical structure, integrating a tailored cost function that balances both objective and constraint-specific penalties to improve solution quality and stability.
We apply this framework to a user-association problem in VNets, where our method achieves 23.5% improvement compared to the deep neural network (DNN) approach.
arXiv Detail & Related papers (2025-01-14T20:21:06Z) - An Adaptive Collocation Point Strategy For Physics Informed Neural Networks via the QR Discrete Empirical Interpolation Method [1.2289361708127877]
We propose an adaptive collocation point selection strategy utilizing the QR Discrete Empirical Interpolation Method (QR-DEIM)
Our results on benchmark PDEs, including the wave, Allen-Cahn, and Burgers' equations, demonstrate that our QR-DEIM-based approach improves PINN accuracy compared to existing methods.
arXiv Detail & Related papers (2025-01-13T21:24:15Z) - Enhancing Variational Quantum Circuit Training: An Improved Neural Network Approach for Barren Plateau Mitigation [0.0]
variational quantum algorithms (VQAs) are among the most promising algorithms in near-term quantum computing.
They iteratively update circuit parameters to optimize a cost function.
The training of variational quantum circuits (VQCs) is susceptible to a phenomenon known as barren plateaus (BPs)
arXiv Detail & Related papers (2024-11-14T06:43:37Z) - Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance.
Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy
Optimization [21.30645601474163]
Original Q-learning suffers from performance and complexity challenges across very large networks.
New model-free ensemble reinforcement learning algorithm which adapts the classical Q-learning is proposed to handle these challenges.
Numerical results show that the proposed algorithm can achieve up to 55% less average policy error with up to 50% less runtime complexity.
arXiv Detail & Related papers (2024-02-08T08:08:23Z) - EPTQ: Enhanced Post-Training Quantization via Hessian-guided Network-wise Optimization [3.3998740964877463]
Quantization is a key method for deploying deep neural networks on edge devices with limited memory and computation resources.
We propose a new method for enhanced Post-Training Quantization (EPTQ) that employs a network-wise quantization optimization process.
arXiv Detail & Related papers (2023-09-20T10:50:28Z) - Differentially Private Deep Q-Learning for Pattern Privacy Preservation
in MEC Offloading [76.0572817182483]
attackers may eavesdrop on the offloading decisions to infer the edge server's (ES's) queue information and users' usage patterns.
We propose an offloading strategy which jointly minimizes the latency, ES's energy consumption, and task dropping rate, while preserving pattern privacy (PP)
We develop a Differential Privacy Deep Q-learning based Offloading (DP-DQO) algorithm to solve this problem while addressing the PP issue by injecting noise into the generated offloading decisions.
arXiv Detail & Related papers (2023-02-09T12:50:18Z) - A Deep-Unfolded Reference-Based RPCA Network For Video
Foreground-Background Separation [86.35434065681925]
This paper proposes a new deep-unfolding-based network design for the problem of Robust Principal Component Analysis (RPCA)
Unlike existing designs, our approach focuses on modeling the temporal correlation between the sparse representations of consecutive video frames.
Experimentation using the moving MNIST dataset shows that the proposed network outperforms a recently proposed state-of-the-art RPCA network in the task of video foreground-background separation.
arXiv Detail & Related papers (2020-10-02T11:40:09Z) - Optimistic Exploration even with a Pessimistic Initialisation [57.41327865257504]
Optimistic initialisation is an effective strategy for efficient exploration in reinforcement learning (RL)
In particular, in scenarios with only positive rewards, Q-values are initialised at their lowest possible values.
We propose a simple count-based augmentation to pessimistically initialised Q-values that separates the source of optimism from the neural network.
arXiv Detail & Related papers (2020-02-26T17:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.