Dynamic Model Tree for Interpretable Data Stream Learning
- URL: http://arxiv.org/abs/2203.16181v1
- Date: Wed, 30 Mar 2022 10:05:35 GMT
- Title: Dynamic Model Tree for Interpretable Data Stream Learning
- Authors: Johannes Haug, Klaus Broelemann, Gjergji Kasneci
- Abstract summary: In this work, we revisit Model Trees for machine learning in evolving data streams.
Our novel framework, called Dynamic Model Tree, satisfies desirable consistency and minimality properties.
- Score: 14.37676876556672
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data streams are ubiquitous in modern business and society. In practice, data
streams may evolve over time and cannot be stored indefinitely. Effective and
transparent machine learning on data streams is thus often challenging.
Hoeffding Trees have emerged as a state-of-the art for online predictive
modelling. They are easy to train and provide meaningful convergence guarantees
under a stationary process. Yet, at the same time, Hoeffding Trees often
require heuristic and costly extensions to adjust to distributional change,
which may considerably impair their interpretability. In this work, we revisit
Model Trees for machine learning in evolving data streams. Model Trees are able
to maintain more flexible and locally robust representations of the active data
concept, making them a natural fit for data stream applications. Our novel
framework, called Dynamic Model Tree, satisfies desirable consistency and
minimality properties. In experiments with synthetic and real-world tabular
streaming data sets, we show that the proposed framework can drastically reduce
the number of splits required by existing incremental decision trees. At the
same time, our framework often outperforms state-of-the-art models in terms of
predictive quality -- especially when concept drift is involved. Dynamic Model
Trees are thus a powerful online learning framework that contributes to more
lightweight and interpretable machine learning in data streams.
Related papers
- Soft Hoeffding Tree: A Transparent and Differentiable Model on Data Streams [2.6524539020042663]
Stream mining algorithms such as Hoeffding trees grow based on the incoming data stream.
We propose soft Hoeffding trees (SoHoT) as a new differentiable and transparent model for possibly infinite and changing data streams.
arXiv Detail & Related papers (2024-11-07T15:49:53Z) - Escaping the Forest: Sparse Interpretable Neural Networks for Tabular Data [0.0]
We show that our models, Sparse TABular NET or sTAB-Net with attention mechanisms, are more effective than tree-based models.
They achieve better performance than post-hoc methods like SHAP.
arXiv Detail & Related papers (2024-10-23T10:50:07Z) - NCART: Neural Classification and Regression Tree for Tabular Data [0.5439020425819]
NCART is a modified version of Residual Networks that replaces fully-connected layers with multiple differentiable oblivious decision trees.
It maintains its interpretability while benefiting from the end-to-end capabilities of neural networks.
The simplicity of the NCART architecture makes it well-suited for datasets of varying sizes.
arXiv Detail & Related papers (2023-07-23T01:27:26Z) - Learning to Jump: Thinning and Thickening Latent Counts for Generative
Modeling [69.60713300418467]
Learning to jump is a general recipe for generative modeling of various types of data.
We demonstrate when learning to jump is expected to perform comparably to learning to denoise, and when it is expected to perform better.
arXiv Detail & Related papers (2023-05-28T05:38:28Z) - Online learning techniques for prediction of temporal tabular datasets
with regime changes [0.0]
We propose a modular machine learning pipeline for ranking predictions on temporal panel datasets.
The modularity of the pipeline allows the use of different models, including Gradient Boosting Decision Trees (GBDTs) and Neural Networks.
Online learning techniques, which require no retraining of models, can be used post-prediction to enhance the results.
arXiv Detail & Related papers (2022-12-30T17:19:00Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - How Well Do Sparse Imagenet Models Transfer? [75.98123173154605]
Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" datasets.
In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset.
We show that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities.
arXiv Detail & Related papers (2021-11-26T11:58:51Z) - Distributionally Robust Recurrent Decoders with Random Network
Distillation [93.10261573696788]
We propose a method based on OOD detection with Random Network Distillation to allow an autoregressive language model to disregard OOD context during inference.
We apply our method to a GRU architecture, demonstrating improvements on multiple language modeling (LM) datasets.
arXiv Detail & Related papers (2021-10-25T19:26:29Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - Lambda Learner: Fast Incremental Learning on Data Streams [5.543723668681475]
We propose a new framework for training models by incremental updates in response to mini-batches from data streams.
We show that the resulting model of our framework closely estimates a periodically updated model trained on offline data and outperforms it when model updates are time-sensitive.
We present a large-scale deployment on the sponsored content platform for a large social network.
arXiv Detail & Related papers (2020-10-11T04:00:34Z) - Neural Additive Models: Interpretable Machine Learning with Neural Nets [77.66871378302774]
Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks.
We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models.
NAMs learn a linear combination of neural networks that each attend to a single input feature.
arXiv Detail & Related papers (2020-04-29T01:28:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.