Bridging Theory and Practice: A Stochastic Learning-Optimization Model for Resilient Automotive Supply Chains
- URL: http://arxiv.org/abs/2511.06479v1
- Date: Sun, 09 Nov 2025 17:51:51 GMT
- Title: Bridging Theory and Practice: A Stochastic Learning-Optimization Model for Resilient Automotive Supply Chains
- Authors: Muhammad Shahnawaz, Adeel Safder,
- Abstract summary: Supply chain disruptions and volatile demand pose significant challenges to the UK automotive industry, which relies heavily on Just-In-Time (JIT) manufacturing.<n>This paper introduces a novel learning-optimization framework that integrates Bayesian inference with inventory optimization for supply chain management.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Supply chain disruptions and volatile demand pose significant challenges to the UK automotive industry, which relies heavily on Just-In-Time (JIT) manufacturing. While qualitative studies highlight the potential of integrating Artificial Intelligence (AI) with traditional optimization, a formal, quantitative demonstration of this synergy is lacking. This paper introduces a novel stochastic learning-optimization framework that integrates Bayesian inference with inventory optimization for supply chain management (SCM). We model a two-echelon inventory system subject to stochastic demand and supply disruptions, comparing a traditional static optimization policy against an adaptive policy where Bayesian learning continuously updates parameter estimates to inform stochastic optimization. Our simulations over 365 periods across three operational scenarios demonstrate that the integrated approach achieves 7.4\% cost reduction in stable environments and 5.7\% improvement during supply disruptions, while revealing important limitations during sudden demand shocks due to the inherent conservatism of Bayesian updating. This work provides mathematical validation for practitioner observations and establishes a formal framework for understanding AI-driven supply chain resilience, while identifying critical boundary conditions for successful implementation.
Related papers
- Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models [52.48582333951919]
We propose a dynamic framework designed to enhance alignment reliability by maximizing the Signal-to-Noise Ratio of policy updates.<n>SAGE (Stability-Aware Gradient Efficiency) integrates a coarse-grained curriculum mechanism that refreshes candidate pools based on model competence.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that SAGE significantly accelerates convergence and outperforms static baselines.
arXiv Detail & Related papers (2026-02-01T12:56:10Z) - ORPR: An OR-Guided Pretrain-then-Reinforce Learning Model for Inventory Management [9.138155308817215]
"Pretrain-then-Reinforce" approach reconciles AI's adaptive perception with Operations Research's structural rigor.<n>We show that a lightweight, domain-informed model can deliver state-of-the-art performance and robust transferability when guided by structured OR logic.
arXiv Detail & Related papers (2025-12-22T03:39:43Z) - BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping [69.74252624161652]
We propose BAlanced Policy Optimization with Adaptive Clipping (BAPO)<n>BAPO dynamically adjusts clipping bounds to adaptively re-balance positive and negative contributions, preserve entropy, and stabilize RL optimization.<n>On AIME 2024 and AIME 2025 benchmarks, our 7B BAPO model surpasses open-source counterparts such as SkyWork-OR1-7B.
arXiv Detail & Related papers (2025-10-21T12:55:04Z) - OmniQuality-R: Advancing Reward Models Through All-Encompassing Quality Assessment [55.59322229889159]
We propose OmniQuality-R, a unified reward modeling framework that transforms multi-task quality reasoning into continuous and interpretable reward signals.<n>We use a reasoning-enhanced reward modeling dataset to form a reliable chain-of-thought dataset for supervised fine-tuning.<n>We evaluate OmniQuality-R on three key IQA tasks: aesthetic quality assessment, technical quality evaluation, and text-image alignment.
arXiv Detail & Related papers (2025-10-12T13:46:28Z) - Stabilizing Policy Gradients for Sample-Efficient Reinforcement Learning in LLM Reasoning [77.92320830700797]
Reinforcement Learning has played a central role in enabling reasoning capabilities of Large Language Models.<n>We propose a tractable computational framework that tracks and leverages curvature information during policy updates.<n>The algorithm, Curvature-Aware Policy Optimization (CAPO), identifies samples that contribute to unstable updates and masks them out.
arXiv Detail & Related papers (2025-10-01T12:29:32Z) - TCPO: Thought-Centric Preference Optimization for Effective Embodied Decision-making [75.29820290660065]
This paper proposes Thought-Centric Preference Optimization ( TCPO) for effective embodied decision-making.<n>It emphasizes the alignment of the model's intermediate reasoning process, mitigating the problem of model degradation.<n>Experiments in the ALFWorld environment demonstrate an average success rate of 26.67%, achieving a 6% improvement over RL4VLM.
arXiv Detail & Related papers (2025-09-10T11:16:21Z) - Deep RL Dual Sourcing Inventory Management with Supply and Capacity Risk Awareness [4.583289433858458]
We study how to efficiently apply reinforcement learning (RL) for solving large-scale optimization problems by leveraging intervention models.<n>We demonstrate our approach on a challenging real-world application, the multi-sourcing multi-period inventory management problem in supply chain optimization.
arXiv Detail & Related papers (2025-07-19T02:44:45Z) - The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility? [54.18519360412294]
Large Language Models (LLMs) must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility.<n>This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance.<n>We analyze experimental results obtained from testing DeepSeek-R1 on our benchmark and reveal the critical ethical concerns raised by this highly acclaimed model.
arXiv Detail & Related papers (2025-01-20T06:35:01Z) - An Innovative Data-Driven and Adaptive Reinforcement Learning Approach for Context-Aware Prescriptive Process Monitoring [3.4437362489150254]
We present a novel framework called Fine-Tuned Offline Reinforcement Learning Augmented Process Sequence Optimization.<n>FORLAPS aims to identify optimal execution paths in business processes by leveraging reinforcement learning enhanced with a state-dependent reward shaping mechanism.<n>We show that FORLAPS achieves 31% savings in resource time spent and a 23% reduction in process time span.
arXiv Detail & Related papers (2025-01-17T20:31:35Z) - Learning Optimization Proxies for Large-Scale Security-Constrained
Economic Dispatch [11.475805963049808]
Security-Constrained Economic Dispatch (SCED) is a fundamental optimization model for Transmission System Operators (TSO)
This paper proposes to learn an optimization proxy for SCED, i.e., a Machine Learning (ML) model that can predict an optimal solution for SCED in milliseconds.
Numerical experiments are reported on the French transmission system, and demonstrate the approach's ability to produce, within a time frame that is compatible with real-time operations.
arXiv Detail & Related papers (2021-12-27T00:44:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.