TempoNet: Slack-Quantized Transformer-Guided Reinforcement Scheduler for Adaptive Deadline-Centric Real-Time Dispatchs
- URL: http://arxiv.org/abs/2602.18109v1
- Date: Fri, 20 Feb 2026 09:56:23 GMT
- Title: TempoNet: Slack-Quantized Transformer-Guided Reinforcement Scheduler for Adaptive Deadline-Centric Real-Time Dispatchs
- Authors: Rong Fu, Yibo Meng, Guangzhen Yao, Jiaxuan Lu, Zeyu Zhang, Zhaolu Kang, Ziming Guo, Jia Yee Tan, Xiaojing Du, Simon James Fong,
- Abstract summary: TempoNet is a reinforcement learning scheduler that pairs a permutation-invariant Transformer with a deep Q-approximation.<n>A latency-aware sparse attention stack with blockwise top-k selection and locality-sensitive chunking enables global reasoning over unordered task sets.
- Score: 8.818252253980985
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-time schedulers must reason about tight deadlines under strict compute budgets. We present TempoNet, a reinforcement learning scheduler that pairs a permutation-invariant Transformer with a deep Q-approximation. An Urgency Tokenizer discretizes temporal slack into learnable embeddings, stabilizing value learning and capturing deadline proximity. A latency-aware sparse attention stack with blockwise top-k selection and locality-sensitive chunking enables global reasoning over unordered task sets with near-linear scaling and sub-millisecond inference. A multicore mapping layer converts contextualized Q-scores into processor assignments through masked-greedy selection or differentiable matching. Extensive evaluations on industrial mixed-criticality traces and large multiprocessor settings show consistent gains in deadline fulfillment over analytic schedulers and neural baselines, together with improved optimization stability. Diagnostics include sensitivity analyses for slack quantization, attention-driven policy interpretation, hardware-in-the-loop and kernel micro-benchmarks, and robustness under stress with simple runtime mitigations; we also report sample-efficiency benefits from behavioral-cloning pretraining and compatibility with an actor-critic variant without altering the inference pipeline. These results establish a practical framework for Transformer-based decision making in high-throughput real-time scheduling.
Related papers
- Gated Differentiable Working Memory for Long-Context Language Modeling [80.27483324685434]
We propose Gdwm (Gated Differentiable Working Memory), a framework that introduces a write controller to gate the consolidation process.<n>Experiments on ZeroSCROLLS and LongBench v2 demonstrate that Gdwm achieves comparable or superior performance with 4$times$ fewer gradient steps than uniform baselines.
arXiv Detail & Related papers (2026-01-19T10:00:33Z) - Topology-Aware Block Coordinate Descent for Qubit Frequency Calibration of Superconducting Quantum Processors [9.256953420444077]
Pre-execution calibration is a major bottleneck for operating quantum processors.<n>We show that the widely-used Snake Sequence is equivalent to Block Coordinate Descent (BCD)<n>These results provide a scalable, implementation-ready workflow for frequency calibration on NISQ-era processors.
arXiv Detail & Related papers (2026-01-15T09:09:36Z) - Accelerate Speculative Decoding with Sparse Computation in Verification [49.74839681322316]
Speculative decoding accelerates autoregressive language model inference by verifying multiple draft tokens in parallel.<n>Existing sparsification methods are designed primarily for standard token-by-token autoregressive decoding.<n>We propose a sparse verification framework that jointly sparsifies attention, FFN, and MoE components during the verification stage to reduce the dominant computation cost.
arXiv Detail & Related papers (2025-12-26T07:53:41Z) - TS-DP: Reinforcement Speculative Decoding For Temporal Adaptive Diffusion Policy Acceleration [64.32072516882947]
Diffusion Policy excels in embodied control but suffers from high inference latency and computational cost.<n>We propose Temporal-aware Reinforcement-based Speculative Diffusion Policy (TS-DP)<n>TS-DP achieves up to 4.17 times faster inference with over 94% accepted drafts, reaching an inference frequency of 25 Hz.
arXiv Detail & Related papers (2025-12-13T07:53:14Z) - Training-free Context-adaptive Attention for Efficient Long Context Modeling [57.703159205740185]
Training-free Context-adaptive Attention (TCA-Attention) is a training-free sparse attention mechanism that selectively attends to only the informative tokens for efficient long-context inference.<n>TCA-Attention achieves a 2.8$times$ speedup and reduces KV cache by 61% at 128K context length while maintaining performance comparable to full attention.
arXiv Detail & Related papers (2025-12-10T01:54:57Z) - Generative Sequential Notification Optimization via Multi-Objective Decision Transformers [9.542285455613927]
We present a Decision Transformer based framework that reframes policy learning as return-conditioned supervised learning.<n>Our contributions include a real-world comparison with CQL, a multi-reward design suitable for non-episodic tasks, and a quantile regression approach to return-to-go conditioning.
arXiv Detail & Related papers (2025-09-02T16:09:02Z) - Provable In-Context Learning of Nonlinear Regression with Transformers [66.99048542127768]
In-context learning (ICL) is the ability to perform unseen tasks using task specific prompts without updating parameters.<n>Recent research has actively explored the training dynamics behind ICL, with much of the focus on relatively simple tasks.<n>This paper investigates more complex nonlinear regression tasks, aiming to uncover how transformers acquire in-context learning capabilities.
arXiv Detail & Related papers (2025-07-28T00:09:28Z) - TTT: A Temporal Refinement Heuristic for Tenuously Tractable Discrete Time Reachability Problems [8.696305200911455]
Reachability analysis for complex control systems is often either slow or overly conservative.<n>This paper introduces the idea of temporal refinement: automatically choosing when along the horizon of the reachability problem to execute slow symbolic queries.<n>We show that our algorithm can generate approximate reachable sets with a similar amount of error to the baseline in 20-70% less time.
arXiv Detail & Related papers (2024-07-19T15:16:25Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z) - Dynamic Compression Ratio Selection for Edge Inference Systems with Hard
Deadlines [9.585931043664363]
We propose a dynamic compression ratio selection scheme for edge inference system with hard deadlines.
Information augmentation that retransmits less compressed data of task with erroneous inference is proposed to enhance the accuracy performance.
Considering the wireless transmission errors, we further design a retransmission scheme to reduce performance degradation due to packet losses.
arXiv Detail & Related papers (2020-05-25T17:11:53Z) - Controllable Time-Delay Transformer for Real-Time Punctuation Prediction
and Disfluency Detection [10.265607222257263]
We propose a Controllable Time-delay Transformer (CT-Transformer) model that jointly completes the punctuation prediction and disfluency detection tasks in real time.
The proposed approach outperforms the previous state-of-the-art models on F-scores and achieves a competitive inference speed.
arXiv Detail & Related papers (2020-03-03T03:17:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.