QC-Forest: a Classical-Quantum Algorithm to Provably Speedup Retraining of Random Forest
- URL: http://arxiv.org/abs/2406.12008v3
- Date: Thu, 11 Jul 2024 13:32:59 GMT
- Title: QC-Forest: a Classical-Quantum Algorithm to Provably Speedup Retraining of Random Forest
- Authors: Romina Yalovetzky, Niraj Kumar, Changhao Li, Marco Pistoia,
- Abstract summary: Random Forest (RF) is a popular tree-ensemble method for supervised learning, prized for its ease of use and flexibility.
Online RF models require to account for new training data to maintain model accuracy.
We propose QC-Forest, a classical-quantum algorithm designed to time-efficiently retrain RF models in the streaming setting.
- Score: 2.6436521007616114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Random Forest (RF) is a popular tree-ensemble method for supervised learning, prized for its ease of use and flexibility. Online RF models require to account for new training data to maintain model accuracy. This is particularly important in applications where data is periodically and sequentially generated over time in data streams, such as auto-driving systems, and credit card payments. In this setting, performing periodic model retraining with the old and new data accumulated is beneficial as it fully captures possible drifts in the data distribution over time. However, this is unpractical with state-of-the-art classical algorithms for RF as they scale linearly with the accumulated number of samples. We propose QC-Forest, a classical-quantum algorithm designed to time-efficiently retrain RF models in the streaming setting for multi-class classification and regression, achieving a runtime poly-logarithmic in the total number of accumulated samples. QC-Forest leverages Des-q, a quantum algorithm for single tree construction and retraining proposed by Kumar et al. by expanding to multi-class classification, as the original proposal was limited to binary classes, and introducing an exact classical method to replace an underlying quantum subroutine incurring a finite error, while maintaining the same poly-logarithmic dependence. Finally, we showcase that QC-Forest achieves competitive accuracy in comparison to state-of-the-art RF methods on widely used benchmark datasets with up to 80,000 samples, while significantly speeding up the model retrain.
Related papers
- Discrete Randomized Smoothing Meets Quantum Computing [40.54768963869454]
We show how to encode all the perturbations of the input binary data in superposition and use Quantum Amplitude Estimation (QAE) to obtain a quadratic reduction in the number of calls to the model.
In addition, we propose a new binary threat model to allow for an extensive evaluation of our approach on images, graphs, and text.
arXiv Detail & Related papers (2024-08-01T20:21:52Z) - Des-q: a quantum algorithm to provably speedup retraining of decision trees [2.7262923206583136]
We introduce Des-q, a novel quantum algorithm to construct and retrain decision trees for regression and binary classification tasks.
We benchmark the simulated version of Des-q against the state-of-the-art classical methods on multiple data sets.
Our algorithm exhibits similar performance to the state-of-the-art decision trees while significantly speeding up the periodic tree retraining.
arXiv Detail & Related papers (2023-09-18T17:56:08Z) - BCQQ: Batch-Constraint Quantum Q-Learning with Cyclic Data Re-uploading [2.502222151305252]
Recent advancements in quantum computing suggest that quantum models might require less data for training compared to classical methods.
We propose a batch RL algorithm that utilizes VQC as function approximators within the discrete batch-constraint deep Q-learning algorithm.
We evaluate the efficiency of our algorithm on the OpenAI CartPole environment and compare its performance to the classical neural network-based discrete BCQ.
arXiv Detail & Related papers (2023-04-27T16:43:01Z) - Online Evolutionary Neural Architecture Search for Multivariate
Non-Stationary Time Series Forecasting [72.89994745876086]
This work presents the Online Neuro-Evolution-based Neural Architecture Search (ONE-NAS) algorithm.
ONE-NAS is a novel neural architecture search method capable of automatically designing and dynamically training recurrent neural networks (RNNs) for online forecasting tasks.
Results demonstrate that ONE-NAS outperforms traditional statistical time series forecasting methods.
arXiv Detail & Related papers (2023-02-20T22:25:47Z) - Effective and Efficient Training for Sequential Recommendation using
Recency Sampling [91.02268704681124]
We propose a novel Recency-based Sampling of Sequences training objective.
We show that the models enhanced with our method can achieve performances exceeding or very close to stateof-the-art BERT4Rec.
arXiv Detail & Related papers (2022-07-06T13:06:31Z) - Ensemble Conformalized Quantile Regression for Probabilistic Time Series
Forecasting [4.716034416800441]
This paper presents a novel probabilistic forecasting method called ensemble conformalized quantile regression (EnCQR)
EnCQR constructs distribution-free and approximately marginally valid prediction intervals (PIs), is suitable for nonstationary and heteroscedastic time series data, and can be applied on top of any forecasting model.
The results demonstrate that EnCQR outperforms models based only on quantile regression or conformal prediction, and it provides sharper, more informative, and valid PIs.
arXiv Detail & Related papers (2022-02-17T16:54:20Z) - Online learning of windmill time series using Long Short-term Cognitive
Networks [58.675240242609064]
The amount of data generated on windmill farms makes online learning the most viable strategy to follow.
We use Long Short-term Cognitive Networks (LSTCNs) to forecast windmill time series in online settings.
Our approach reported the lowest forecasting errors with respect to a simple RNN, a Long Short-term Memory, a Gated Recurrent Unit, and a Hidden Markov Model.
arXiv Detail & Related papers (2021-07-01T13:13:24Z) - A Distributed Optimisation Framework Combining Natural Gradient with
Hessian-Free for Discriminative Sequence Training [16.83036203524611]
This paper presents a novel natural gradient and Hessian-free (NGHF) optimisation framework for neural network training.
It relies on the linear conjugate gradient (CG) algorithm to combine the natural gradient (NG) method with local curvature information from Hessian-free (HF) or other second-order methods.
Experiments are reported on the multi-genre broadcast data set for a range of different acoustic model types.
arXiv Detail & Related papers (2021-03-12T22:18:34Z) - Anomaly Detection of Time Series with Smoothness-Inducing Sequential
Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series.
Our model parameterizes mean and variance for each time-stamp with flexible neural networks.
We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z) - AIN: Fast and Accurate Sequence Labeling with Approximate Inference
Network [75.44925576268052]
The linear-chain Conditional Random Field (CRF) model is one of the most widely-used neural sequence labeling approaches.
Exact probabilistic inference algorithms are typically applied in training and prediction stages of the CRF model.
We propose to employ a parallelizable approximate variational inference algorithm for the CRF model.
arXiv Detail & Related papers (2020-09-17T12:18:43Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.