Related papers: Empirical Characterization of Temporal Constraint Processing in LLMs

Empirical Characterization of Temporal Constraint Processing in LLMs

URL: http://arxiv.org/abs/2511.10654v1
Date: Sun, 02 Nov 2025 20:03:52 GMT
Title: Empirical Characterization of Temporal Constraint Processing in LLMs
Authors: Javier Marín,
Abstract summary: We characterize temporal constraint processing across eight production-scale models (2.8-8B parameters) using deadline detection tasks.<n>We show that fine-tuning on 200 synthetic examples improves models with partial capability by 12-37 percentage points.<n>This capability requires architectural mechanisms for: (1) continuous temporal state representation, (2) explicit constraint checking separate from linguistic pattern matching, and (3) systematic compositional reasoning over temporal relations.
Score: 0.2538209532048866
License: http://creativecommons.org/licenses/by/4.0/
Abstract: When deploying LLMs in agentic architectures requiring real-time decisions under temporal constraints, we assume they reliably determine whether action windows remain open or have closed. This assumption is untested. We characterize temporal constraint processing across eight production-scale models (2.8-8B parameters) using deadline detection tasks, revealing systematic deployment risks: bimodal performance distribution (models achieve either 95% or 50% accuracy), extreme prompt brittleness (30-60 percentage point swings from formatting changes alone), and systematic action bias (100% false positive rates in failing models). Parameter count shows no correlation with capability in this range-a 3.8B model matches 7B models while other 7B models fail completely. Fine-tuning on 200 synthetic examples improves models with partial capability by 12-37 percentage points. We demonstrate that temporal constraint satisfaction cannot be reliably learned through next-token prediction on natural language, even with targeted fine-tuning. This capability requires architectural mechanisms for: (1) continuous temporal state representation, (2) explicit constraint checking separate from linguistic pattern matching, (3) systematic compositional reasoning over temporal relations. Current autoregressive architectures lack these mechanisms. Deploying such systems in time-critical applications without hybrid architectures incorporating symbolic reasoning modules represents unacceptable risk.

Related papers

Bridging Temporal and Textual Modalities: A Multimodal Framework for Automated Cloud Failure Root Cause Analysis [0.0]
This paper presents a diagnostic framework that harmonizes time-series representations with pretrained language model embedding spaces.<n>Our framework achieves leading performance, reaching 48.75% diagnostic accuracy with notable improvements on scenarios involving compound failure modes.
arXiv Detail & Related papers (2026-01-08T08:20:44Z)
Beyond Mimicry: Preference Coherence in LLMs [0.19116784879310025]
We investigate whether large language models exhibit genuine preference structures by testing their responses to AI-specific trade-offs.<n>We find 23 combinations (47.9%) demonstrated statistically significant relationships between scenario intensity and choice patterns.<n>Only 5 combinations (10.4%) demonstrate meaningful preference coherence through adaptive or threshold-based behavior.<n>The prevalence of unstable transitions (45.8%) and stimulus-specific sensitivities suggests current AI systems lack unified preference structures.
arXiv Detail & Related papers (2025-11-17T17:41:48Z)
Agentic World Modeling for 6G: Near-Real-Time Generative State-Space Reasoning [70.56067503630486]
We argue that sixth-generation (6G) intelligence is not fluent token prediction but calibrated the capacity to imagine and choose.<n>We show that WM-MS3M cuts mean absolute error (MAE) by 1.69% versus MS3M with 32% fewer parameters and similar latency, and achieves 35-80% lower root mean squared error (RMSE) than attention/hybrid baselines with 2.3-4.1x faster inference.
arXiv Detail & Related papers (2025-11-04T17:22:22Z)
Subject-Event Ontology Without Global Time: Foundations and Execution Semantics [51.56484100374058]
The formalization includes nine axioms (A1-A9), ensuring the correctness of executable: monotonicity of history (I1), acyclicity of causality (I2), traceability (I3)<n>The formalization is applicable to distributed systems, microservice architectures, DLT platforms, and multiperspectivity scenarios (conflicting facts from different subjects)<n>Special attention is given to the model-based approach (A9): event validation via schemas, actor authorization, automatic construction of causal chains (W3) without global time.
arXiv Detail & Related papers (2025-10-20T19:26:44Z)
Are Large Reasoning Models Interruptible? [77.53059044071107]
Large Reasoning Models (LRMs) excel at complex reasoning but are traditionally evaluated in static, "frozen world" settings.<n>We show that even state-of-the-art LRMs, which achieve high accuracy in static settings, can fail unpredictably when interrupted or exposed to changing context.<n>Our analysis further reveals several novel failure modes, including reasoning leakage, panic, and self-doubt.
arXiv Detail & Related papers (2025-10-13T17:59:35Z)
Enhanced accuracy through ensembling of randomly initialized auto-regressive models for time-dependent PDEs [0.0]
Autoregressive inference with machine learning models suffer from error accumulation over successive predictions, limiting their long-term accuracy.<n>We propose a deep ensemble framework to address this challenge, where multiple ML surrogate models are trained in parallel and aggregated during inference.<n>We validate the framework on three PDE-driven dynamical systems - stress evolution in heterogeneous microstructures, Gray-Scott reaction-diffusion, and planetary-scale shallow water system.
arXiv Detail & Related papers (2025-07-05T02:25:12Z)
Model Discovery and Graph Simulation: A Lightweight Gateway to Chaos Engineering [0.0]
Chaos engineering reveals resilience risks but is expensive and operationally risky to run broadly and often.<n>We claim that a simple connectivity-only topological model can provide fast, low-risk availability estimates under fail-stop faults.
arXiv Detail & Related papers (2025-06-12T10:59:28Z)
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models [86.88657425848547]
Large reasoning models (LRMs) already possess a latent capacity for long chain-of-thought reasoning.<n>We explicitly align models with three meta-abilities: deduction, induction, and abduction, using automatically generated, self-verifiable tasks.<n>Our three stage-pipeline individual alignment, parameter-space merging, and domain-specific reinforcement learning, boosts performance by over 10% relative to instruction-tuned baselines.
arXiv Detail & Related papers (2025-05-15T17:58:33Z)
Uncertainty Quantification of Surrogate Models using Conformal Prediction [7.445864392018774]
We formalise a conformal prediction framework that satisfies predictions in a model-agnostic manner, requiring near-zero computational costs. The paper looks at providing statistically valid error bars for deterministic models, as well as crafting guarantees to the error bars of probabilistic models.
arXiv Detail & Related papers (2024-08-19T10:46:19Z)
Complex Event Forecasting with Prediction Suffix Trees: Extended Technical Report [70.7321040534471]
Complex Event Recognition (CER) systems have become popular in the past two decades due to their ability to "instantly" detect patterns on real-time streams of events. There is a lack of methods for forecasting when a pattern might occur before such an occurrence is actually detected by a CER engine. We present a formal framework that attempts to address the issue of Complex Event Forecasting.
arXiv Detail & Related papers (2021-09-01T09:52:31Z)
Anomaly Detection of Time Series with Smoothness-Inducing Sequential Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series. Our model parameterizes mean and variance for each time-stamp with flexible neural networks. We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.