Related papers: Integrating AI and Ensemble Forecasting: Explainable Materials Planning with Scorecards and Trend Insights for a Large-Scale Manufacturer

Integrating AI and Ensemble Forecasting: Explainable Materials Planning with Scorecards and Trend Insights for a Large-Scale Manufacturer

URL: http://arxiv.org/abs/2510.01006v1
Date: Wed, 01 Oct 2025 15:14:10 GMT
Title: Integrating AI and Ensemble Forecasting: Explainable Materials Planning with Scorecards and Trend Insights for a Large-Scale Manufacturer
Authors: Saravanan Venkatachalam,
Abstract summary: This paper presents a practical architecture for after-sales demand forecasting and monitoring.<n>It unifies a revenue- and cluster-aware ensemble of statistical, machine-learning, and deep-learning models.<n>System closes the loop between forecasting, monitoring, and inventory decisions across more than 90 countries and about 6,000 parts.
Score: 0.45687771576879593
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: This paper presents a practical architecture for after-sales demand forecasting and monitoring that unifies a revenue- and cluster-aware ensemble of statistical, machine-learning, and deep-learning models with a role-driven analytics layer for scorecards and trend diagnostics. The framework ingests exogenous signals (installed base, pricing, macro indicators, life cycle, seasonality) and treats COVID-19 as a distinct regime, producing country-part forecasts with calibrated intervals. A Pareto-aware segmentation forecasts high-revenue items individually and pools the long tail via clusters, while horizon-aware ensembling aligns weights with business-relevant losses (e.g., WMAPE). Beyond forecasts, a performance scorecard delivers decision-focused insights: accuracy within tolerance thresholds by revenue share and count, bias decomposition (over- vs under-forecast), geographic and product-family hotspots, and ranked root causes tied to high-impact part-country pairs. A trend module tracks trajectories of MAPE/WMAPE and bias across recent months, flags entities that are improving or deteriorating, detects change points aligned with known regimes, and attributes movements to lifecycle and seasonal factors. LLMs are embedded in the analytics layer to generate role-aware narratives and enforce reporting contracts. They standardize business definitions, automate quality checks and reconciliations, and translate quantitative results into concise, explainable summaries for planners and executives. The system exposes a reproducible workflow -- request specification, model execution, database-backed artifacts, and AI-generated narratives -- so planners can move from "How accurate are we now?" to "Where is accuracy heading and which levers should we pull?", closing the loop between forecasting, monitoring, and inventory decisions across more than 90 countries and about 6,000 parts.

Related papers

It's TIME: Towards the Next Generation of Time Series Forecasting Benchmarks [87.7937890373758]
Time series foundation models (TSFMs) are revolutionizing the forecasting landscape from specific dataset modeling to generalizable task evaluation.<n>We introduce TIME, a next-generation task-centric benchmark comprising 50 fresh datasets and 98 forecasting tasks.<n>We propose a novel pattern-level evaluation perspective that moves beyond traditional dataset-level evaluations based on static meta labels.
arXiv Detail & Related papers (2026-02-12T16:31:01Z)
Bridging Forecast Accuracy and Inventory KPIs: A Simulation-Based Software Framework [4.089848545480847]
We propose a decision-centric simulation framework that enables systematic evaluation of forecasting models in realistic inventory management setting.<n>We show that improvements in accuracy metrics do not necessarily lead to better, and that models with similar error profiles can induce different cost-service trade-offs.<n>Overall, the framework links demand forecasting and inventory management, shifting evaluation from predictive accuracy toward operational relevance.
arXiv Detail & Related papers (2026-01-29T15:20:33Z)
Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs [66.63911043019294]
Data preparation aims to denoise raw datasets, uncover cross-dataset relationships, and extract valuable insights from them.<n>This paper focuses on the use of LLM techniques to prepare data for diverse downstream tasks.<n>We introduce a task-centric taxonomy that organizes the field into three major tasks: data cleaning, standardization, error processing, imputation, data integration, and data enrichment.
arXiv Detail & Related papers (2026-01-22T12:02:45Z)
TimeSeriesScientist: A General-Purpose AI Agent for Time Series Analysis [25.377586527585503]
TimeSeriesScientist (TSci) is a general, domain-agnostic framework for time series forecasting.<n>It reduces forecast error by an average of 10.4% and 38.2%, respectively.<n>With transparent natural-language rationales and comprehensive reports, TSci transforms the forecasting into a white-box system.
arXiv Detail & Related papers (2025-10-02T00:18:59Z)
A Survey of Reasoning and Agentic Systems in Time Series with Large Language Models [22.683448537572897]
Time series reasoning treats time as a first-class axis and incorporates intermediate evidence directly into the answer.<n>This survey defines the problem and organizes the literature by reasoning topology with three families: direct reasoning in one step, linear chain reasoning with explicit intermediates, and branch-structured reasoning.
arXiv Detail & Related papers (2025-09-15T04:39:50Z)
LPI-RIT at LeWiDi-2025: Improving Distributional Predictions via Metadata and Loss Reweighting with DisCo [6.4877384679152525]
Learning With Disagreements (LeWiDi) 2025 aims to model annotator disagreement through soft label distribution prediction and perspectivist evaluation.<n>We adapt DisCo, a neural architecture that jointly models item-level and annotator-level label distributions, and present detailed analysis and improvements.
arXiv Detail & Related papers (2025-08-11T16:39:09Z)
Comparative Analysis of Modern Machine Learning Models for Retail Sales Forecasting [0.0]
When forecasts underestimate the level of sales, firms experience lost sales, shortages, and impact on the reputation of the retailer in their relevant market.<n>This study provides an exhaustive assessment of the forecasting models applied to a high-resolution brick-and-mortar retail dataset.
arXiv Detail & Related papers (2025-06-06T10:08:17Z)
Consistency Checks for Language Model Forecasters [54.62507816753479]
We measure the performance of forecasters in terms of the consistency of their predictions on different logically-related questions.<n>We build an automated evaluation system that generates a set of base questions, instantiates consistency checks from these questions, elicits predictions of the forecaster, and measures the consistency of the predictions.
arXiv Detail & Related papers (2024-12-24T16:51:35Z)
A Comprehensive Forecasting Framework based on Multi-Stage Hierarchical Forecasting Reconciliation and Adjustment [16.859089765648356]
We propose a novel framework to address the challenges of preserving seasonality, ensuring coherence, and improving accuracy.<n>The proposed framework has been deployed and leveraged by Walmart's ads, sales and operations teams to track future demands.
arXiv Detail & Related papers (2024-12-19T10:33:19Z)
Identifying and Mitigating Social Bias Knowledge in Language Models [52.52955281662332]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.<n>FAST surpasses state-of-the-art baselines with superior debiasing performance.<n>This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z)
Eliminating Position Bias of Language Models: A Mechanistic Approach [119.34143323054143]
Position bias has proven to be a prevalent issue of modern language models (LMs)<n>Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of-the-art LMs: causal attention and relative positional encodings.<n>By eliminating position bias, models achieve better performance and reliability in downstream tasks, including LM-as-a-judge, retrieval-augmented QA, molecule generation, and math reasoning.
arXiv Detail & Related papers (2024-07-01T09:06:57Z)
F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data [65.6499834212641]
We formulate the demand prediction as a meta-learning problem and develop the Feature-based First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm. By considering domain similarities through task-specific metadata, our model improved generalization, where the excess risk decreases as the number of training tasks increases. Compared to existing state-of-the-art models, our method demonstrates a notable improvement in demand prediction accuracy, reducing the Mean Absolute Error by 26.24% on an internal vending machine dataset and by 1.04% on the publicly accessible JD.com dataset.
arXiv Detail & Related papers (2024-06-23T21:28:50Z)
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs) We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence. We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z)
Making forecasting self-learning and adaptive -- Pilot forecasting rack [0.0]
This paper presents our findings based on a proactive pilot exercise to explore ways to help retailers to improve forecast accuracy for such product categories. We evaluated opportunities for algorithmic interventions to improve forecast accuracy based on a sample product category, Knitwear. Our outcomes show an increase in the accuracy of demand forecast for Knitwear product category by 20%, taking the overall accuracy to 80%.
arXiv Detail & Related papers (2023-06-12T03:26:11Z)
Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints. This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks. Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.