Deep Learning for Perishable Inventory Systems with Human Knowledge
- URL: http://arxiv.org/abs/2601.15589v1
- Date: Thu, 22 Jan 2026 02:26:32 GMT
- Title: Deep Learning for Perishable Inventory Systems with Human Knowledge
- Authors: Xuan Liao, Zhenkang Peng, Ying Rong,
- Abstract summary: We study a perishable inventory system with random lead times in which both the demand process and the lead time distribution are unknown.<n>We adopt a marginal cost accounting scheme that assigns each order a single lifetime cost and yields a unified loss function for end-to-end learning.<n>We develop two end-to-end variants: a purely blackbox approach that outputs order quantities directly (E2E-BB) and a structure-guided approach that embeds the projected inventory level (PIL) policy.
- Score: 0.6920276126310231
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Managing perishable products with limited lifetimes is a fundamental challenge in inventory management, as poor ordering decisions can quickly lead to stockouts or excessive waste. We study a perishable inventory system with random lead times in which both the demand process and the lead time distribution are unknown. We consider a practical setting where orders are placed using limited historical data together with observed covariates and current system states. To improve learning efficiency under limited data, we adopt a marginal cost accounting scheme that assigns each order a single lifetime cost and yields a unified loss function for end-to-end learning. This enables training a deep learning-based policy that maps observed covariates and system states directly to order quantities. We develop two end-to-end variants: a purely black-box approach that outputs order quantities directly (E2E-BB), and a structure-guided approach that embeds the projected inventory level (PIL) policy, capturing inventory effects through explicit computation rather than additional learning (E2E-PIL). We further show that the objective induced by E2E-PIL is homogeneous of degree one, enabling a boosting technique from operational data analytics (ODA) that yields an enhanced policy (E2E-BPIL). Experiments on synthetic and real data establish a robust performance ordering: E2E-BB is dominated by E2E-PIL, which is further improved by E2E-BPIL. Using an excess-risk decomposition, we show that embedding heuristic policy structure reduces effective model complexity and improves learning efficiency with only a modest loss of flexibility. More broadly, our results suggest that deep learning-based decision tools are more effective and robust when guided by human knowledge, highlighting the value of integrating advanced analytics with inventory theory.
Related papers
- Active Learning Using Aggregated Acquisition Functions: Accuracy and Sustainability Analysis [14.398823059302279]
Active learning (AL) is a machine learning approach that strategically selects the most informative samples for annotation during training.<n>This strategy not only reduces labeling expenses but also results in energy savings during neural network training.<n>We implement and evaluate various state-of-the-art acquisition functions, analyzing their accuracy and computational costs.
arXiv Detail & Related papers (2026-02-07T08:42:12Z) - Evolutionary Strategies lead to Catastrophic Forgetting in LLMs [51.91763220981834]
Evolutionary Strategies (ES) have recently re-emerged as a gradient-free alternative to traditional learning algorithms.<n>ES is able to reach performance numbers close to GRPO for math and reasoning tasks with a comparable compute budget.<n>ES is accompanied by significant forgetting of prior abilities, limiting its applicability for training models online.
arXiv Detail & Related papers (2026-01-28T18:59:34Z) - Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization [72.20212909644017]
Deliberate Practice Policy Optimization (DPPO) is a metacognitive Metaloop'' training framework.<n>DPPO alternates between supervised fine-tuning (competence expansion) and reinforcement learning (skill refinement)<n> Empirically, training a vision-language embodied model with DPPO, referred to as Pelican-VL 1.0, yields a 20.3% performance improvement over the base model.<n>We are open-sourcing both the models and code, providing the first systematic framework that alleviates the data and resource bottleneck.
arXiv Detail & Related papers (2025-11-20T17:58:04Z) - Retaining by Doing: The Role of On-Policy Data in Mitigating Forgetting [40.80967570661867]
Adapting language models to new tasks via post-training carries the risk of degrading existing capabilities.<n>We compare the forgetting patterns of two widely adopted post-training methods: supervised fine-tuning (SFT) and reinforcement learning (RL)<n>RL leads to less forgetting than SFT while achieving comparable or higher target task performance.
arXiv Detail & Related papers (2025-10-21T17:59:41Z) - UniErase: Towards Balanced and Precise Unlearning in Language Models [69.04923022755547]
Large language models (LLMs) require iterative updates to address the outdated information problem.<n>UniErase is a novel unlearning framework that demonstrates precision and balanced performances between knowledge unlearning and ability retaining.
arXiv Detail & Related papers (2025-05-21T15:53:28Z) - DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [86.76714527437383]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks.<n>We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge.<n>Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z) - Offline Behavior Distillation [57.6900189406964]
Massive reinforcement learning (RL) data are typically collected to train policies offline without the need for interactions.
We formulate offline behavior distillation (OBD), which synthesizes limited expert behavioral data from sub-optimal RL data.
We propose two naive OBD objectives, DBC and PBC, which measure distillation performance via the decision difference between policies trained on distilled data and either offline data or a near-expert policy.
arXiv Detail & Related papers (2024-10-30T06:28:09Z) - Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning [28.059563581973432]
Large Language Models (LLMs) often have sensitive, private, or copyrighted data during pre-training.
LLMs unlearning aims to eliminate the influence of undesirable data from the pre-trained model.
We propose Negative Preference Optimization (NPO) as a simple alignment-inspired method that could efficiently unlearn a target dataset.
arXiv Detail & Related papers (2024-04-08T21:05:42Z) - Rethinking Resource Management in Edge Learning: A Joint Pre-training and Fine-tuning Design Paradigm [87.47506806135746]
In some applications, edge learning is experiencing a shift in focusing from conventional learning from scratch to new two-stage learning.
This paper considers the problem of joint communication and computation resource management in a two-stage edge learning system.
It is shown that the proposed joint resource management over the pre-training and fine-tuning stages well balances the system performance trade-off.
arXiv Detail & Related papers (2024-04-01T00:21:11Z) - E2E-AT: A Unified Framework for Tackling Uncertainty in Task-aware
End-to-end Learning [9.741277008050927]
We propose a unified framework that covers the uncertainties emerging in both the input feature space of the machine learning models and the constrained optimization models.
We show that neglecting the uncertainty of COs during training causes a new trigger for generalization errors.
The framework is described as a robust optimization problem and is practically solved via end-to-end adversarial training (E2E-AT)
arXiv Detail & Related papers (2023-12-17T02:23:25Z) - Parcel loss prediction in last-mile delivery: deep and non-deep
approaches with insights from Explainable AI [1.104960878651584]
We propose two machine learning approaches, namely, Data Balance with Supervised Learning (DBSL) and Deep Hybrid Ensemble Learning (DHEL)
The practical implication of such predictions is their value in aiding e-commerce retailers in optimizing insurance-related decision-making policies.
arXiv Detail & Related papers (2023-10-25T12:46:34Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z) - Comparing Deep Reinforcement Learning Algorithms in Two-Echelon Supply
Chains [1.4685355149711299]
We analyze and compare the performance of state-of-the-art deep reinforcement learning algorithms for solving the supply chain inventory management problem.
This study provides detailed insight into the design and development of an open-source software library that provides a customizable environment for solving the supply chain inventory management problem.
arXiv Detail & Related papers (2022-04-20T16:33:01Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.