Dynamic Ensemble Size Adjustment for Memory Constrained Mondrian Forest
- URL: http://arxiv.org/abs/2210.05704v1
- Date: Tue, 11 Oct 2022 18:05:58 GMT
- Title: Dynamic Ensemble Size Adjustment for Memory Constrained Mondrian Forest
- Authors: Martin Khannouz and Tristan Glatard
- Abstract summary: In this paper, we show that under memory constraints, increasing the size of a tree-based ensemble classifier can worsen its performance.
We experimentally show the existence of an optimal ensemble size for a memory-bounded Mondrian forest on data streams.
We conclude that our method can achieve up to 95% of the performance of an optimally-sized Mondrian forest for stable datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Supervised learning algorithms generally assume the availability of enough
memory to store data models during the training and test phases. However, this
assumption is unrealistic when data comes in the form of infinite data streams,
or when learning algorithms are deployed on devices with reduced amounts of
memory. Such memory constraints impact the model behavior and assumptions. In
this paper, we show that under memory constraints, increasing the size of a
tree-based ensemble classifier can worsen its performance. In particular, we
experimentally show the existence of an optimal ensemble size for a
memory-bounded Mondrian forest on data streams and we design an algorithm to
guide the forest toward that optimal number by using an estimation of
overfitting. We tested different variations for this algorithm on a variety of
real and simulated datasets, and we conclude that our method can achieve up to
95% of the performance of an optimally-sized Mondrian forest for stable
datasets, and can even outperform it for datasets with concept drifts. All our
methods are implemented in the OrpailleCC open-source library and are ready to
be used on embedded systems and connected objects.
Related papers
- Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Learning Large Scale Sparse Models [6.428186644949941]
We consider learning sparse models in large scale settings, where the number of samples and the feature dimension can grow as large as millions or billions.
We propose to learn sparse models such as Lasso in an online manner where in each, only one randomly chosen sample is revealed to update a sparse gradient.
Thereby, the memory cost is independent of the sample size and gradient evaluation for one sample is efficient.
arXiv Detail & Related papers (2023-01-26T06:29:49Z) - Eigen Memory Tree [27.33148786536804]
This work introduces the Eigen Memory Tree (EMT), a novel online memory model for sequential learning scenarios.
We demonstrate that EMT outperforms existing online memory approaches, and provide a hybridized EMT-parametric algorithm that enjoys drastically improved performance.
Our findings are validated using 206 datasets from the OpenML repository in both bounded and infinite memory budget situations.
arXiv Detail & Related papers (2022-10-25T14:57:41Z) - A Memory Transformer Network for Incremental Learning [64.0410375349852]
We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from.
Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the "catastrophic forgetting" of previously seen classes.
One of the most successful existing methods has been the use of a memory of exemplars, which overcomes the issue of catastrophic forgetting by saving a subset of past data into a memory bank and utilizing it to prevent forgetting when training future tasks.
arXiv Detail & Related papers (2022-10-10T08:27:28Z) - A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental
Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement.
We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work.
We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z) - Mondrian Forest for Data Stream Classification Under Memory Constraints [0.0]
We adapt the online Mondrian forest classification algorithm to work with memory constraints on data streams.
We design five out-of-memory strategies to update Mondrian trees with new data points when the memory limit is reached.
We also design trimming mechanisms to make Mondrian trees more robust to concept drifts under memory constraints.
arXiv Detail & Related papers (2022-05-12T15:35:03Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - An Adaptive Framework for Learning Unsupervised Depth Completion [59.17364202590475]
We present a method to infer a dense depth map from a color image and associated sparse depth measurements.
We show that regularization and co-visibility are related via the fitness of the model to data and can be unified into a single framework.
arXiv Detail & Related papers (2021-06-06T02:27:55Z) - Semantically Constrained Memory Allocation (SCMA) for Embedding in
Efficient Recommendation Systems [27.419109620575313]
A key challenge for deep learning models is to work with millions of categorical classes or tokens.
We propose a novel formulation of memory shared embedding, where memory is shared in proportion to the overlap in semantic information.
We demonstrate a significant reduction in the memory footprint while maintaining performance.
arXiv Detail & Related papers (2021-02-24T19:55:49Z) - Memory-Efficient Sampling for Minimax Distance Measures [4.873362301533825]
In this paper, we investigate efficient sampling schemes in order to reduce the memory requirement and provide a linear space complexity.
We evaluate the methods on real-world datasets from different domains and analyze the results.
arXiv Detail & Related papers (2020-05-26T11:00:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.