Do Math Reasoning LLMs Help Predict the Impact of Public Transit Events?
- URL: http://arxiv.org/abs/2511.00808v1
- Date: Sun, 02 Nov 2025 05:21:33 GMT
- Title: Do Math Reasoning LLMs Help Predict the Impact of Public Transit Events?
- Authors: Bowen Fang, Ruijian Zha, Xuan Di,
- Abstract summary: We introduce a tolerance-based, shaped reward function that grants partial credit within a continuous error margin, rather than demanding a single correct answer.<n>Our findings show that general-purpose, instruction-tuned LLMs significantly outperform specialized math-reasoning models.<n>This demonstrates that RLVR can be successfully adapted to real-world, noisy forecasting, but requires a verifier design that reflects the continuous nature of the problem.
- Score: 6.428337528749318
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predicting public transit incident duration from unstructured text alerts is a critical but challenging task. Addressing the domain sparsity of transit operations with standard Supervised Fine-Tuning (SFT) is difficult, as the task involves noisy, continuous labels and lacks reliable expert demonstrations for reasoning. While Reinforcement Learning from Verifiable Rewards (RLVR) excels at tasks with binary correctness, like mathematics, its applicability to noisy, continuous forecasting is an open question. This work, to our knowledge, is the first to bridge the gap between RLVR LLM training with the critical, real-world forecasting challenges in public transit operations. We adapt RLVR to this task by introducing a tolerance-based, shaped reward function that grants partial credit within a continuous error margin, rather than demanding a single correct answer. We systematically evaluate this framework on a curated dataset of NYC MTA service alerts. Our findings show that general-purpose, instruction-tuned LLMs significantly outperform specialized math-reasoning models, which struggle with the ambiguous, real-world text. We empirically demonstrate that the binary reward is unstable and degrades performance, whereas our shaped reward design is critical and allows our model to dominate on the most challenging metrics. While classical regressors are superior at minimizing overall MAE or MSE, our RLVR approach achieved a 35\% relative improvement in 5-minute accuracy (Acc@5) over the strongest baseline. This demonstrates that RLVR can be successfully adapted to real-world, noisy forecasting, but requires a verifier design that reflects the continuous nature of the problem.
Related papers
- From Verifiable Dot to Reward Chain: Harnessing Verifiable Reference-based Rewards for Reinforcement Learning of Open-ended Generation [52.62655622099456]
We propose reinforcement learning with verifiable reference-based rewards (RLVRR)<n>Instead of checking the final answer, RLVRR extracts an ordered linguistic signal from high-quality references (i.e., reward chain)<n>In this way, RLVRR decomposes rewards into two dimensions: content, which preserves deterministic core concepts, and style, which evaluates adherence to stylistic properties.
arXiv Detail & Related papers (2026-01-26T14:39:58Z) - Forget Less, Retain More: A Lightweight Regularizer for Rehearsal-Based Continual Learning [51.07663354001582]
Deep neural networks suffer from catastrophic forgetting, where performance on previous tasks degrades after training on a new task.<n>We present a novel approach to address this challenge, focusing on the intersection of memory-based methods and regularization approaches.<n>We formulate a regularization strategy, termed Information Maximization (IM) regularizer, for memory-based continual learning methods.
arXiv Detail & Related papers (2025-12-01T15:56:00Z) - Auditable-choice reframing unlocks RL-based verification for open-ended tasks [23.12421867559344]
Verifiable Multiple-Choice Reformulation (VMR) is a novel training strategy that restructures open-ended data into verifiable multiple-choice formats.<n>Across eight open-ended benchmarks, our VMR-based training delivers an average gain of 5.99 points over the baseline.
arXiv Detail & Related papers (2025-11-04T10:45:52Z) - Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries [23.825984868116716]
We introduce Ariadne, a framework utilizing synthetic mazes for multi-step spatial reasoning.<n>We leverage this controllable environment to train Vision-Language Models (VLMs) using Reinforcement Learning with Verified Rewards (RLVR) in a difficulty-aware curriculum.<n>Surprisingly, post-RLVR training, the VLM achieves over 50% accuracy on a problem set where the base model scored 0%.
arXiv Detail & Related papers (2025-11-01T21:19:41Z) - Grounded in Reality: Learning and Deploying Proactive LLM from Offline Logs [72.08224879435762]
textttLearn-to-Ask is a simulator-free framework for learning and deploying proactive dialogue agents.<n>Our approach culminates in the successful deployment of LLMs into a live, large-scale online AI service.
arXiv Detail & Related papers (2025-10-29T12:08:07Z) - Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in Large Reasoning Models [33.214586668992965]
Reinforcement learning with verifiable rewards (RLVR) has delivered impressive gains in mathematical and multimodal reasoning.<n>We propose RECAP-a replay strategy with dynamic objective reweighting for general knowledge.<n>Our method is end-to-end and readily applicable to existing RLVR pipelines without training additional models or heavy tuning.
arXiv Detail & Related papers (2025-10-24T19:08:48Z) - VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators [38.880852900641]
Vision-Language-Action (VLA) models enable embodied decision-making but rely heavily on imitation learning.<n>We introduce VLA-RFT, a reinforcement fine-tuning framework that leverages a data-driven world model as a controllable simulator.<n>With fewer than 400 fine-tuning steps, VLA-RFT surpasses strong supervised baselines and achieves greater efficiency than simulator-based RL.
arXiv Detail & Related papers (2025-10-01T01:33:10Z) - Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate [118.37653302885607]
We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate the multi-modal pre-training quality of Large Vision Language Models (LVLMs)
MIR is indicative about training data selection, training strategy schedule, and model architecture design to get better pre-training results.
arXiv Detail & Related papers (2024-10-09T17:59:04Z) - Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding [49.973156959947346]
Existing Video Temporal Grounding (VTG) models excel in accuracy but often overlook open-world challenges posed by open-vocabulary queries and untrimmed videos.
We introduce a robust network module that benefits from a two-stage cross-modal alignment task.
It integrates Deep Evidential Regression (DER) to explicitly and thoroughly quantify uncertainty during training.
In response, we develop a simple yet effective Geom-regularizer that enhances the uncertainty learning framework from the ground up.
arXiv Detail & Related papers (2024-08-29T05:32:03Z) - Revisiting the Robustness of the Minimum Error Entropy Criterion: A
Transfer Learning Case Study [16.07380451502911]
This paper revisits the robustness of the minimum error entropy criterion to deal with non-Gaussian noises.
We investigate its feasibility and usefulness in real-life transfer learning regression tasks, where distributional shifts are common.
arXiv Detail & Related papers (2023-07-17T15:38:11Z) - FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [54.34189781923818]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment.<n>We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function.<n>We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.