Related papers: Multimodal Temporal Fusion Transformers Are Good Product Demand Forecasters

Multimodal Temporal Fusion Transformers Are Good Product Demand Forecasters

URL: http://arxiv.org/abs/2307.02578v1
Date: Wed, 5 Jul 2023 18:23:13 GMT
Title: Multimodal Temporal Fusion Transformers Are Good Product Demand Forecasters
Authors: Maarten Sukel, Stevan Rudinac, Marcel Worring
Abstract summary: Multimodal demand forecasting aims at predicting product demand utilizing visual, textual, and contextual information. This paper proposes a method for multimodal product demand forecasting using convolutional, graph-based, and transformer-based architectures.
Score: 18.52252059555198
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal demand forecasting aims at predicting product demand utilizing visual, textual, and contextual information. This paper proposes a method for multimodal product demand forecasting using convolutional, graph-based, and transformer-based architectures. Traditional approaches to demand forecasting rely on historical demand, product categories, and additional contextual information such as seasonality and events. However, these approaches have several shortcomings, such as the cold start problem making it difficult to predict product demand until sufficient historical data is available for a particular product, and their inability to properly deal with category dynamics. By incorporating multimodal information, such as product images and textual descriptions, our architecture aims to address the shortcomings of traditional approaches and outperform them. The experiments conducted on a large real-world dataset show that the proposed approach effectively predicts demand for a wide range of products. The multimodal pipeline presented in this work enhances the accuracy and reliability of the predictions, demonstrating the potential of leveraging multimodal information in product demand forecasting.

Related papers

LLMForecaster: Improving Seasonal Event Forecasts with Unstructured Textual Data [63.777637042161544]
This paper introduces a novel forecast post-processor that fine-tunes large language models to incorporate unstructured semantic and contextual information and historical data. In an industry-scale retail application, we demonstrate that our technique yields statistically significantly forecast improvements across several sets of products subject to holiday-driven demand surges.
arXiv Detail & Related papers (2024-12-03T16:18:42Z)
Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters. Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z)
Context Matters: Leveraging Contextual Features for Time Series Forecasting [2.9687381456164004]
We introduce ContextFormer, a novel plug-and-play method to surgically integrate multimodal contextual information into existing forecasting models. ContextFormer effectively distills forecast-specific information from rich multimodal contexts, including categorical, continuous, time-varying, and even textual information. It outperforms SOTA forecasting models by up to 30% on a range of real-world datasets spanning energy, traffic, environmental, and financial domains.
arXiv Detail & Related papers (2024-10-16T15:36:13Z)
Inter-Series Transformer: Attending to Products in Time Series Forecasting [5.459207333107234]
We develop a new Transformer-based forecasting approach using a shared, multi-task per-time series network. We provide a case study applying our approach to successfully improve demand prediction for a medical device manufacturing company.
arXiv Detail & Related papers (2024-08-07T16:22:21Z)
Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development [67.55944651679864]
We present a new sandbox suite tailored for integrated data-model co-development. This sandbox provides a feedback-driven experimental platform, enabling cost-effective and guided refinement of both data and models.
arXiv Detail & Related papers (2024-07-16T14:40:07Z)
F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data [65.6499834212641]
We formulate the demand prediction as a meta-learning problem and develop the Feature-based First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm. By considering domain similarities through task-specific metadata, our model improved generalization, where the excess risk decreases as the number of training tasks increases. Compared to existing state-of-the-art models, our method demonstrates a notable improvement in demand prediction accuracy, reducing the Mean Absolute Error by 26.24% on an internal vending machine dataset and by 1.04% on the publicly accessible JD.com dataset.
arXiv Detail & Related papers (2024-06-23T21:28:50Z)
Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond [87.1712108247199]
Our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP) We develop a generic and personalization generative framework, that can handle a wide range of personalized needs. Our methodology enhances the capabilities of foundational language models for personalized tasks.
arXiv Detail & Related papers (2024-03-15T20:21:31Z)
Incorporating Pre-trained Model Prompting in Multimodal Stock Volume Movement Prediction [22.949484374773967]
We propose the Prompt-based MUltimodal Stock volumE prediction model (ProMUSE) to process text and time series modalities. We use pre-trained language models for better comprehension of financial news. We also propose a novel cross-modality contrastive alignment while reserving the unimodal heads beside the fusion head to mitigate this problem.
arXiv Detail & Related papers (2023-09-11T16:47:01Z)
Deep Learning based Forecasting: a case study from the online fashion industry [7.694480564850072]
We describe the data and our modelling approach for this forecasting problem in detail and present empirical results. In this case study, we describe the data and our modelling approach for this forecasting problem in detail and present empirical results.
arXiv Detail & Related papers (2023-05-23T13:30:35Z)
Multimodal Neural Network For Demand Forecasting [0.8602553195689513]
We propose a multi-modal sales forecasting network that combines real-life events from news articles with traditional data such as historical sales and holiday information. We show statistically significant improvements in the SMAPE error metric with an average improvement of 7.37% against the existing state-of-the-art sales forecasting techniques.
arXiv Detail & Related papers (2022-10-20T18:06:36Z)
Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features. We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors. Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z)
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining [108.86502855439774]
We investigate a more realistic setting that aims to perform weakly-supervised multi-modal instance-level product retrieval. We contribute Product1M, one of the largest multi-modal cosmetic datasets for real-world instance-level retrieval. We propose a novel model named Cross-modal contrAstive Product Transformer for instance-level prodUct REtrieval (CAPTURE)
arXiv Detail & Related papers (2021-07-30T12:11:24Z)
Pre-training Graph Transformer with Multimodal Side Information for Recommendation [82.4194024706817]
We propose a pre-training strategy to learn item representations by considering both item side information and their relationships. We develop a novel sampling algorithm named MCNSampling to select contextual neighbors for each item. The proposed Pre-trained Multimodal Graph Transformer (PMGT) learns item representations with two objectives: 1) graph structure reconstruction, and 2) masked node feature reconstruction.
arXiv Detail & Related papers (2020-10-23T10:30:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.