RetailSynth: Synthetic Data Generation for Retail AI Systems Evaluation
- URL: http://arxiv.org/abs/2312.14095v1
- Date: Thu, 21 Dec 2023 18:17:16 GMT
- Title: RetailSynth: Synthetic Data Generation for Retail AI Systems Evaluation
- Authors: Yu Xia, Ali Arian, Sriram Narayanamoorthy, and Joshua Mabry
- Abstract summary: We propose a multi-stage model for simulating customer shopping behavior.
We embedded this model into a working simulation environment -- Retail Synth.
We analyzed for impact on revenue, category penetration, and customer retention.
- Score: 4.182702249796689
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Significant research effort has been devoted in recent years to developing
personalized pricing, promotions, and product recommendation algorithms that
can leverage rich customer data to learn and earn. Systematic benchmarking and
evaluation of these causal learning systems remains a critical challenge, due
to the lack of suitable datasets and simulation environments. In this work, we
propose a multi-stage model for simulating customer shopping behavior that
captures important sources of heterogeneity, including price sensitivity and
past experiences. We embedded this model into a working simulation environment
-- RetailSynth. RetailSynth was carefully calibrated on publicly available
grocery data to create realistic synthetic shopping transactions. Multiple
pricing policies were implemented within the simulator and analyzed for impact
on revenue, category penetration, and customer retention. Applied researchers
can use RetailSynth to validate causal demand models for multi-category retail
and to incorporate realistic price sensitivity into emerging benchmarking
suites for personalized pricing, promotions, and product recommendations.
Related papers
- Consumer Transactions Simulation through Generative Adversarial Networks [0.07373617024876725]
This paper presents an innovative application of Generative Adversarial Networks (GANs) to generate synthetic retail transaction data.
We diverge from conventional methodologies by integrating SKU data into our GAN architecture and using more sophisticated embedding methods.
Preliminary results demonstrate enhanced realism in simulated transactions measured by comparing generated items with real ones.
arXiv Detail & Related papers (2024-08-07T09:45:24Z) - IMFL-AIGC: Incentive Mechanism Design for Federated Learning Empowered by Artificial Intelligence Generated Content [15.620004060097155]
Federated learning (FL) has emerged as a promising paradigm that enables clients to collaboratively train a shared global model without uploading their local data.
We propose a data quality-aware incentive mechanism to encourage clients' participation.
Our proposed mechanism exhibits highest training accuracy and reduces up to 53.34% of the server's cost with real-world datasets.
arXiv Detail & Related papers (2024-06-12T07:47:22Z) - Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets.
Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z) - Improve Fidelity and Utility of Synthetic Credit Card Transaction Time
Series from Data-centric Perspective [10.996626204702189]
We focus on attaining both high fidelity to actual data and optimal utility for machine learning tasks.
We introduce five pre-processing schemas to enhance the training of the Conditional Probabilistic Auto-Regressive Model.
Our attention shifts to training fraud detection models tailored for time-series data, evaluating the utility of the synthetic data.
arXiv Detail & Related papers (2024-01-01T22:34:14Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic
Data [91.52783572568214]
Synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs.
We discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data.
arXiv Detail & Related papers (2023-04-07T16:38:40Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Synthetic Data-Based Simulators for Recommender Systems: A Survey [55.60116686945561]
This survey aims at providing a comprehensive overview of the recent trends in the field of modeling and simulation.
We start with the motivation behind the development of frameworks implementing the simulations -- simulators.
We provide a new consistent classification of existing simulators based on their functionality, approbation, and industrial effectiveness.
arXiv Detail & Related papers (2022-06-22T19:33:21Z) - Model Distillation for Revenue Optimization: Interpretable Personalized
Pricing [8.07517029746865]
We present a customized, prescriptive tree-based algorithm that distills knowledge from a complex black-box machine learning algorithm.
It segments customers with similar valuations and prescribes prices in such a way that maximizes revenue while maintaining interpretability.
arXiv Detail & Related papers (2020-07-03T18:33:23Z) - A Data-driven Market Simulator for Small Data Environments [0.5872014229110214]
Neural network based data-driven market simulation unveils a new and flexible way of modelling financial time series.
We show how a rough paths perspective combined with a parsimonious Variational Autoencoder framework provides a powerful way for encoding and evaluating financial time series.
arXiv Detail & Related papers (2020-06-21T14:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.