Multimodal Temporal Fusion Transformers Are Good Product Demand
Forecasters
- URL: http://arxiv.org/abs/2307.02578v1
- Date: Wed, 5 Jul 2023 18:23:13 GMT
- Title: Multimodal Temporal Fusion Transformers Are Good Product Demand
Forecasters
- Authors: Maarten Sukel, Stevan Rudinac, Marcel Worring
- Abstract summary: Multimodal demand forecasting aims at predicting product demand utilizing visual, textual, and contextual information.
This paper proposes a method for multimodal product demand forecasting using convolutional, graph-based, and transformer-based architectures.
- Score: 18.52252059555198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal demand forecasting aims at predicting product demand utilizing
visual, textual, and contextual information. This paper proposes a method for
multimodal product demand forecasting using convolutional, graph-based, and
transformer-based architectures. Traditional approaches to demand forecasting
rely on historical demand, product categories, and additional contextual
information such as seasonality and events. However, these approaches have
several shortcomings, such as the cold start problem making it difficult to
predict product demand until sufficient historical data is available for a
particular product, and their inability to properly deal with category
dynamics. By incorporating multimodal information, such as product images and
textual descriptions, our architecture aims to address the shortcomings of
traditional approaches and outperform them. The experiments conducted on a
large real-world dataset show that the proposed approach effectively predicts
demand for a wide range of products. The multimodal pipeline presented in this
work enhances the accuracy and reliability of the predictions, demonstrating
the potential of leveraging multimodal information in product demand
forecasting.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - Context Matters: Leveraging Contextual Features for Time Series Forecasting [2.9687381456164004]
We introduce ContextFormer, a novel plug-and-play method to surgically integrate multimodal contextual information into existing forecasting models.
ContextFormer effectively distills forecast-specific information from rich multimodal contexts, including categorical, continuous, time-varying, and even textual information.
It outperforms SOTA forecasting models by up to 30% on a range of real-world datasets spanning energy, traffic, environmental, and financial domains.
arXiv Detail & Related papers (2024-10-16T15:36:13Z) - Inter-Series Transformer: Attending to Products in Time Series Forecasting [5.459207333107234]
We develop a new Transformer-based forecasting approach using a shared, multi-task per-time series network.
We provide a case study applying our approach to successfully improve demand prediction for a medical device manufacturing company.
arXiv Detail & Related papers (2024-08-07T16:22:21Z) - F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data [65.6499834212641]
We formulate the demand prediction as a meta-learning problem and develop the Feature-based First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm.
By considering domain similarities through task-specific metadata, our model improved generalization, where the excess risk decreases as the number of training tasks increases.
Compared to existing state-of-the-art models, our method demonstrates a notable improvement in demand prediction accuracy, reducing the Mean Absolute Error by 26.24% on an internal vending machine dataset and by 1.04% on the publicly accessible JD.com dataset.
arXiv Detail & Related papers (2024-06-23T21:28:50Z) - Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond [87.1712108247199]
Our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP)
We develop a generic and personalization generative framework, that can handle a wide range of personalized needs.
Our methodology enhances the capabilities of foundational language models for personalized tasks.
arXiv Detail & Related papers (2024-03-15T20:21:31Z) - Incorporating Pre-trained Model Prompting in Multimodal Stock Volume
Movement Prediction [22.949484374773967]
We propose the Prompt-based MUltimodal Stock volumE prediction model (ProMUSE) to process text and time series modalities.
We use pre-trained language models for better comprehension of financial news.
We also propose a novel cross-modality contrastive alignment while reserving the unimodal heads beside the fusion head to mitigate this problem.
arXiv Detail & Related papers (2023-09-11T16:47:01Z) - Deep Learning based Forecasting: a case study from the online fashion
industry [7.694480564850072]
We describe the data and our modelling approach for this forecasting problem in detail and present empirical results.
In this case study, we describe the data and our modelling approach for this forecasting problem in detail and present empirical results.
arXiv Detail & Related papers (2023-05-23T13:30:35Z) - Multimodal Neural Network For Demand Forecasting [0.8602553195689513]
We propose a multi-modal sales forecasting network that combines real-life events from news articles with traditional data such as historical sales and holiday information.
We show statistically significant improvements in the SMAPE error metric with an average improvement of 7.37% against the existing state-of-the-art sales forecasting techniques.
arXiv Detail & Related papers (2022-10-20T18:06:36Z) - Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features.
We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors.
Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z) - Product1M: Towards Weakly Supervised Instance-Level Product Retrieval
via Cross-modal Pretraining [108.86502855439774]
We investigate a more realistic setting that aims to perform weakly-supervised multi-modal instance-level product retrieval.
We contribute Product1M, one of the largest multi-modal cosmetic datasets for real-world instance-level retrieval.
We propose a novel model named Cross-modal contrAstive Product Transformer for instance-level prodUct REtrieval (CAPTURE)
arXiv Detail & Related papers (2021-07-30T12:11:24Z) - Pre-training Graph Transformer with Multimodal Side Information for
Recommendation [82.4194024706817]
We propose a pre-training strategy to learn item representations by considering both item side information and their relationships.
We develop a novel sampling algorithm named MCNSampling to select contextual neighbors for each item.
The proposed Pre-trained Multimodal Graph Transformer (PMGT) learns item representations with two objectives: 1) graph structure reconstruction, and 2) masked node feature reconstruction.
arXiv Detail & Related papers (2020-10-23T10:30:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.