Related papers: Dealing with zero-inflated data: achieving SOTA with a two-fold machine learning approach

Dealing with zero-inflated data: achieving SOTA with a two-fold machine learning approach

URL: http://arxiv.org/abs/2310.08088v1
Date: Thu, 12 Oct 2023 07:26:41 GMT
Title: Dealing with zero-inflated data: achieving SOTA with a two-fold machine learning approach
Authors: Jo\v{z}e M. Ro\v{z}anec, Ga\v{s}per Petelin, Jo\~ao Costa, Bla\v{z} Bertalani\v{c}, Gregor Cerar, Marko Gu\v{c}ek, Gregor Papa, Dunja Mladeni\'c
Abstract summary: This paper showcases two real-world use cases (home appliances classification and airport shuttle demand prediction) where a hierarchical model applied in the context of zero-inflated data leads to excellent results. It is estimated that the proposed approach is also four times more energy efficient than the SOTA approach against which it was compared.
Score: 0.18846515534317262
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In many cases, a machine learning model must learn to correctly predict a few data points with particular values of interest in a broader range of data where many target values are zero. Zero-inflated data can be found in diverse scenarios, such as lumpy and intermittent demands, power consumption for home appliances being turned on and off, impurities measurement in distillation processes, and even airport shuttle demand prediction. The presence of zeroes affects the models' learning and may result in poor performance. Furthermore, zeroes also distort the metrics used to compute the model's prediction quality. This paper showcases two real-world use cases (home appliances classification and airport shuttle demand prediction) where a hierarchical model applied in the context of zero-inflated data leads to excellent results. In particular, for home appliances classification, the weighted average of Precision, Recall, F1, and AUC ROC was increased by 27%, 34%, 49%, and 27%, respectively. Furthermore, it is estimated that the proposed approach is also four times more energy efficient than the SOTA approach against which it was compared to. Two-fold models performed best in all cases when predicting airport shuttle demand, and the difference against other models has been proven to be statistically significant.

Related papers

Echo State Networks for Time Series Forecasting: Hyperparameter Sweep and Benchmarking [51.56484100374058]
We evaluate whether a fully automatic, purely feedback-driven ESN can serve as a competitive alternative to widely used statistical forecasting methods.<n>Forecast accuracy is measured using MASE and sMAPE and benchmarked against simple benchmarks like drift and seasonal naive and statistical models.
arXiv Detail & Related papers (2026-02-03T16:01:22Z)
DataDecide: How to Predict Best Pretraining Data with Small Experiments [67.95896457895404]
We release models, data, and evaluations in DataDecide -- the most extensive open suite of models over differences in data and scale. We conduct controlled pretraining experiments across 25 corpora with differing sources, deduplication, and filtering up to 100B tokens, model sizes up to 1B parameters, and 3 random seeds.
arXiv Detail & Related papers (2025-04-15T17:02:15Z)
Time-Series Foundation Model for Value-at-Risk [9.090616417812306]
Foundation models, pre-trained on vast and varied datasets, can be used in a zero-shot setting with relatively minimal data. We compare the performance of Google's model, called TimesFM, against conventional parametric and non-parametric models.
arXiv Detail & Related papers (2024-10-15T16:53:44Z)
Using Generative Models to Produce Realistic Populations of the United Kingdom Windstorms [0.0]
dissertation explores the application of generative models to produce realistic synthetic wind field data. Three models, including standard GANs, WGAN-GP, and U-net diffusion models, were employed to generate wind maps of the UK. The results reveal that while all models are effective in capturing the general spatial characteristics, each model exhibits distinct strengths and weaknesses.
arXiv Detail & Related papers (2024-09-16T19:53:33Z)
F-FOMAML: GNN-Enhanced Meta-Learning for Peak Period Demand Forecasting with Proxy Data [65.6499834212641]
We formulate the demand prediction as a meta-learning problem and develop the Feature-based First-Order Model-Agnostic Meta-Learning (F-FOMAML) algorithm. By considering domain similarities through task-specific metadata, our model improved generalization, where the excess risk decreases as the number of training tasks increases. Compared to existing state-of-the-art models, our method demonstrates a notable improvement in demand prediction accuracy, reducing the Mean Absolute Error by 26.24% on an internal vending machine dataset and by 1.04% on the publicly accessible JD.com dataset.
arXiv Detail & Related papers (2024-06-23T21:28:50Z)
CaFA: Global Weather Forecasting with Factorized Attention on Sphere [7.687215328455751]
We propose a factorized-attention-based model tailored for spherical geometries to mitigate this issue. The deterministic forecasting accuracy of the proposed model on $1.5circ$ and 0-7 days' lead time is on par with state-of-the-art purely data-driven machine learning weather prediction models.
arXiv Detail & Related papers (2024-05-12T23:18:14Z)
Air Quality Forecasting Using Machine Learning: A Global perspective with Relevance to Low-Resource Settings [0.0]
Air pollution stands as the fourth leading cause of death globally. This study proposes a novel machine learning approach for accurate air quality prediction using two months of air quality data.
arXiv Detail & Related papers (2024-01-09T05:52:02Z)
Residual Corrective Diffusion Modeling for Km-scale Atmospheric Downscaling [58.456404022536425]
State of the art for physical hazard prediction from weather and climate requires expensive km-scale numerical simulations driven by coarser resolution global inputs. Here, a generative diffusion architecture is explored for downscaling such global inputs to km-scale, as a cost-effective machine learning alternative. The model is trained to predict 2km data from a regional weather model over Taiwan, conditioned on a 25km global reanalysis.
arXiv Detail & Related papers (2023-09-24T19:57:22Z)
A Meta-Learning Approach to Predicting Performance and Data Requirements [163.4412093478316]
We propose an approach to estimate the number of samples required for a model to reach a target performance. We find that the power law, the de facto principle to estimate model performance, leads to large error when using a small dataset. We introduce a novel piecewise power law (PPL) that handles the two data differently.
arXiv Detail & Related papers (2023-03-02T21:48:22Z)
X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning. To take the power of both worlds, we propose a novel X-model. X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z)
Back2Future: Leveraging Backfill Dynamics for Improving Real-time Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task. 'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature. We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z)
Churn Reduction via Distillation [54.5952282395487]
We show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn. We then show that distillation performs strongly for low churn training against a number of recent baselines.
arXiv Detail & Related papers (2021-06-04T18:03:31Z)
A Data-Driven Machine Learning Approach for Consumer Modeling with Load Disaggregation [1.6058099298620423]
We propose a generic class of data-driven semiparametric models derived from consumption data of residential consumers. In the first stage, disaggregation of the load into fixed and shiftable components is accomplished by means of a hybrid algorithm. In the second stage, the model parameters are estimated using an L2-norm, epsilon-insensitive regression approach.
arXiv Detail & Related papers (2020-11-04T13:36:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.