Faster Approximate Dynamic Programming by Freezing Slow States
- URL: http://arxiv.org/abs/2301.00922v1
- Date: Tue, 3 Jan 2023 01:35:24 GMT
- Title: Faster Approximate Dynamic Programming by Freezing Slow States
- Authors: Yijia Wang, Daniel R. Jiang
- Abstract summary: We consider infinite horizon Markov decision processes (MDPs) with fast-slow structure.
Such structure is common in real-world problems where sequential decisions need to be made at high frequencies.
We propose an approximate dynamic programming framework based on the idea of "freezing" the slow states.
- Score: 5.6928413790238865
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider infinite horizon Markov decision processes (MDPs) with fast-slow
structure, meaning that certain parts of the state space move "fast" (and in a
sense, are more influential) while other parts transition more "slowly." Such
structure is common in real-world problems where sequential decisions need to
be made at high frequencies, yet information that varies at a slower timescale
also influences the optimal policy. Examples include: (1) service allocation
for a multi-class queue with (slowly varying) stochastic costs, (2) a restless
multi-armed bandit with an environmental state, and (3) energy demand response,
where both day-ahead and real-time prices play a role in the firm's revenue.
Models that fully capture these problems often result in MDPs with large state
spaces and large effective time horizons (due to frequent decisions), rendering
them computationally intractable. We propose an approximate dynamic programming
algorithmic framework based on the idea of "freezing" the slow states, solving
a set of simpler finite-horizon MDPs (the lower-level MDPs), and applying value
iteration (VI) to an auxiliary MDP that transitions on a slower timescale (the
upper-level MDP). We also extend the technique to a function approximation
setting, where a feature-based linear architecture is used. On the theoretical
side, we analyze the regret incurred by each variant of our frozen-state
approach. Finally, we give empirical evidence that the frozen-state approach
generates effective policies using just a fraction of the computational cost,
while illustrating that simply omitting slow states from the decision modeling
is often not a viable heuristic.
Related papers
- Temporal Feature Matters: A Framework for Diffusion Model Quantization [105.3033493564844]
Diffusion models rely on the time-step for the multi-round denoising.
We introduce a novel quantization framework that includes three strategies.
This framework preserves most of the temporal information and ensures high-quality end-to-end generation.
arXiv Detail & Related papers (2024-07-28T17:46:15Z) - A physics-informed neural network method for the approximation of slow invariant manifolds for the general class of stiff systems of ODEs [0.0]
We present a physics-informed neural network (PINN) approach for the discovery of slow invariant manifold (SIMs)
In contrast to other machine learning (ML) approaches that construct reduced order black box surrogate models, our approach, simultaneously decomposes the vector field into fast and slow components.
We show that the proposed PINN scheme provides SIM approximations, of equivalent or even higher accuracy, than those provided by QSSA, PEA and CSP.
arXiv Detail & Related papers (2024-03-18T09:10:39Z) - Fast, Scalable, Warm-Start Semidefinite Programming with Spectral
Bundling and Sketching [53.91395791840179]
We present Unified Spectral Bundling with Sketching (USBS), a provably correct, fast and scalable algorithm for solving massive SDPs.
USBS provides a 500x speed-up over the state-of-the-art scalable SDP solver on an instance with over 2 billion decision variables.
arXiv Detail & Related papers (2023-12-19T02:27:22Z) - StreamFlow: Streamlined Multi-Frame Optical Flow Estimation for Video
Sequences [31.210626775505407]
Occlusions between consecutive frames have long posed a significant challenge in optical flow estimation.
We present a Streamlined In-batch Multi-frame (SIM) pipeline tailored to video input, attaining a similar level of time efficiency to two-frame networks.
StreamFlow not only excels in terms of performance on challenging KITTI and Sintel datasets, with particular improvement in occluded areas.
arXiv Detail & Related papers (2023-11-28T07:53:51Z) - TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models [52.454274602380124]
Diffusion models heavily depend on the time-step $t$ to achieve satisfactory multi-round denoising.
We propose a Temporal Feature Maintenance Quantization (TFMQ) framework building upon a Temporal Information Block.
Powered by the pioneering block design, we devise temporal information aware reconstruction (TIAR) and finite set calibration (FSC) to align the full-precision temporal features.
arXiv Detail & Related papers (2023-11-27T12:59:52Z) - Non-stationary Reinforcement Learning under General Function
Approximation [60.430936031067006]
We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs.
Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA.
We show that SW-OPEA is provably efficient as long as the variation budget is not significantly large.
arXiv Detail & Related papers (2023-06-01T16:19:37Z) - SMDP-Based Dynamic Batching for Efficient Inference on GPU-Based
Platforms [14.42787221783853]
This paper aims to provide a dynamic graphics policy that strikes a balance between efficiency and latency.
The proposed solution has notable flexibility in balancing power consumption and latency.
arXiv Detail & Related papers (2023-01-30T13:19:16Z) - Collaborative Intelligent Reflecting Surface Networks with Multi-Agent
Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks.
In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z) - An Adaptive State Aggregation Algorithm for Markov Decision Processes [10.494611365482028]
We propose an intuitive algorithm for solving MDPs that reduces the cost of value iteration updates by dynamically grouping together states with similar cost-to-go values.
Our algorithm converges almost surely to within (2varepsilon / (1 - gamma) of the true optimal value in the (ellinfty) norm, where (gamma) is the discount factor and aggregated states differ by at most (varepsilon)
arXiv Detail & Related papers (2021-07-23T07:19:43Z) - Acting in Delayed Environments with Non-Stationary Markov Policies [57.52103323209643]
We introduce a framework for learning and planning in MDPs where the decision-maker commits actions that are executed with a delay of $m$ steps.
We prove that with execution delay, deterministic Markov policies in the original state-space are sufficient for attaining maximal reward, but need to be non-stationary.
We devise a non-stationary Q-learning style model-based algorithm that solves delayed execution tasks without resorting to state-augmentation.
arXiv Detail & Related papers (2021-01-28T13:35:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.