Kinematic Tokenization: Optimization-Based Continuous-Time Tokens for Learnable Decision Policies in Noisy Time Series
- URL: http://arxiv.org/abs/2601.09949v2
- Date: Sun, 18 Jan 2026 15:10:01 GMT
- Title: Kinematic Tokenization: Optimization-Based Continuous-Time Tokens for Learnable Decision Policies in Noisy Time Series
- Authors: Griffin Kearney,
- Abstract summary: Transformers are designed for discrete tokens, yet many real-world signals are continuous processes observed through noisy sampling.<n>We introduce Kinematic Tokenization, an optimization-based continuous-time representation.<n>We show that explicit continuous-time tokens can improve the learnability and calibration of selective decision policies in noisy time series under abstention-inducing losses.
- Score: 0.2538209532048867
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformers are designed for discrete tokens, yet many real-world signals are continuous processes observed through noisy sampling. Discrete tokenizations (raw values, patches, finite differences) can be brittle in low signal-to-noise regimes, especially when downstream objectives impose asymmetric penalties that rationally encourage abstention. We introduce Kinematic Tokenization, an optimization-based continuous-time representation that reconstructs an explicit spline from noisy measurements and tokenizes local spline coefficients (position, velocity, acceleration, jerk). This is applied to financial time series data in the form of asset prices in conjunction with trading volume profiles. Across a multi-asset daily-equity testbed, we use a risk-averse asymmetric classification objective as a stress test for learnability. Under this objective, several discrete baselines collapse to an absorbing cash policy (the Liquidation Equilibrium), whereas the continuous spline tokens sustain calibrated, non-trivial action distributions and stable policies. These results suggest that explicit continuous-time tokens can improve the learnability and calibration of selective decision policies in noisy time series under abstention-inducing losses.
Related papers
- TempoNet: Slack-Quantized Transformer-Guided Reinforcement Scheduler for Adaptive Deadline-Centric Real-Time Dispatchs [8.818252253980985]
TempoNet is a reinforcement learning scheduler that pairs a permutation-invariant Transformer with a deep Q-approximation.<n>A latency-aware sparse attention stack with blockwise top-k selection and locality-sensitive chunking enables global reasoning over unordered task sets.
arXiv Detail & Related papers (2026-02-20T09:56:23Z) - Online Causal Kalman Filtering for Stable and Effective Policy Optimization [23.37041897899078]
We show that local off-policy deviation is structurally inconsistent at the token level.<n>We propose Online Causal Kalman Filtering for stable and effective Policy Optimization.
arXiv Detail & Related papers (2026-02-11T07:57:43Z) - Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning [55.2818264614932]
RankTuner introduces a probability--entropy calibration signal, the Relative Rank Indicator, which compares the rank of the ground-truth token with its expected rank under the prediction distribution.<n>The inverse indicator is used as a token-wise Relative Scale to reweight the fine-tuning objective, focusing updates on truly under-learned tokens.
arXiv Detail & Related papers (2026-02-02T07:27:19Z) - Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z) - GRPO-Guard: Mitigating Implicit Over-Optimization in Flow Matching via Regulated Clipping [63.33669214116784]
GRPO-Guard is a simple yet effective enhancement to existing GRPO frameworks.<n>It restores a balanced and step-consistent importance ratio, ensuring that PPO clipping properly constrains harmful updates.<n>It substantially mitigates implicit over-optimization without relying on heavy KL regularization.
arXiv Detail & Related papers (2025-10-25T14:51:17Z) - Accuracy of Discretely Sampled Stochastic Policies in Continuous-time Reinforcement Learning [3.973277434105709]
We rigorously analyze a policy execution framework that samples actions from a policy at discrete time points and implements them as piecewise constant controls.<n>We prove that as the sampling mesh size tends to zero, the controlled state process converges weakly to the dynamics with coefficients according to the policy.<n>Building on these results, we analyze the bias and variance of various policy gradient estimators based on discrete-time observations.
arXiv Detail & Related papers (2025-03-13T02:35:23Z) - Error-quantified Conformal Inference for Time Series [55.11926160774831]
Uncertainty quantification in time series prediction is challenging due to the temporal dependence and distribution shift on sequential data.<n>We propose itError-quantified Conformal Inference (ECI) by smoothing the quantile loss function.<n>ECI can achieve valid miscoverage control and output tighter prediction sets than other baselines.
arXiv Detail & Related papers (2025-02-02T15:02:36Z) - An Idiosyncrasy of Time-discretization in Reinforcement Learning [7.085780872622857]
We study how the choice of discretization may affect a reinforcement learning algorithm.
We acknowledge an idiosyncrasy with naively applying a discrete-time algorithm to a discretized continuous-time environment.
arXiv Detail & Related papers (2024-06-21T08:03:25Z) - Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling.
Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z) - $K$-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic
Control [0.6906005491572401]
We propose a novel $K$-nearest neighbor reparametric procedure for estimating the performance of a policy from historical data.
Our analysis allows for the sampling of entire episodes, as is common practice in most applications.
Compared to other OPE methods, our algorithm does not require optimization, can be efficiently implemented via tree-based nearest neighbor search and parallelization, and does not explicitly assume a parametric model for the environment's dynamics.
arXiv Detail & Related papers (2023-06-07T23:55:12Z) - Learning Noise Transition Matrix from Only Noisy Labels via Total
Variation Regularization [88.91872713134342]
We propose a theoretically grounded method that can estimate the noise transition matrix and learn a classifier simultaneously.
We show the effectiveness of the proposed method through experiments on benchmark and real-world datasets.
arXiv Detail & Related papers (2021-02-04T05:09:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.