Related papers: Dynamic Ensemble Size Adjustment for Memory Constrained Mondrian Forest

Dynamic Ensemble Size Adjustment for Memory Constrained Mondrian Forest

URL: http://arxiv.org/abs/2210.05704v1
Date: Tue, 11 Oct 2022 18:05:58 GMT
Title: Dynamic Ensemble Size Adjustment for Memory Constrained Mondrian Forest
Authors: Martin Khannouz and Tristan Glatard
Abstract summary: In this paper, we show that under memory constraints, increasing the size of a tree-based ensemble classifier can worsen its performance. We experimentally show the existence of an optimal ensemble size for a memory-bounded Mondrian forest on data streams. We conclude that our method can achieve up to 95% of the performance of an optimally-sized Mondrian forest for stable datasets.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Supervised learning algorithms generally assume the availability of enough memory to store data models during the training and test phases. However, this assumption is unrealistic when data comes in the form of infinite data streams, or when learning algorithms are deployed on devices with reduced amounts of memory. Such memory constraints impact the model behavior and assumptions. In this paper, we show that under memory constraints, increasing the size of a tree-based ensemble classifier can worsen its performance. In particular, we experimentally show the existence of an optimal ensemble size for a memory-bounded Mondrian forest on data streams and we design an algorithm to guide the forest toward that optimal number by using an estimation of overfitting. We tested different variations for this algorithm on a variety of real and simulated datasets, and we conclude that our method can achieve up to 95% of the performance of an optimally-sized Mondrian forest for stable datasets, and can even outperform it for datasets with concept drifts. All our methods are implemented in the OrpailleCC open-source library and are ready to be used on embedded systems and connected objects.

Related papers

Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources? [58.56306556151929]
Mixture-of-Experts (MoE) language models dramatically expand model capacity and achieve remarkable performance without increasing per-token compute.<n>Can MoEs surpass dense architectures under strictly equal resource constraints?<n>We show that an MoE model with activation rate in an optimal region is able to outperform its dense counterpart under the same total parameter, training compute and data resource.
arXiv Detail & Related papers (2025-06-13T17:59:05Z)
Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning [53.527506374566485]
We propose a novel Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning cluster framework, namely AR-DBSCAN.<n>We show that AR-DBSCAN not only improves clustering accuracy by up to 144.1% and 175.3% in the NMI and ARI metrics, respectively, but also is capable of robustly finding dominant parameters.
arXiv Detail & Related papers (2025-05-07T11:37:23Z)
Efficient Ternary Weight Embedding Model: Bridging Scalability and Performance [15.877771709013743]
In this work, we propose a novel finetuning framework to ternary-weight embedding models. To apply ternarization to pre-trained embedding models, we introduce self-taught knowledge distillation to finalize the ternary-weights of the linear layers. With extensive experiments on public text and vision datasets, we demonstrated that without sacrificing effectiveness, the ternarized model consumes low memory usage.
arXiv Detail & Related papers (2024-11-23T03:44:56Z)
Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching. Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z)
Eigen Memory Tree [27.33148786536804]
This work introduces the Eigen Memory Tree (EMT), a novel online memory model for sequential learning scenarios. We demonstrate that EMT outperforms existing online memory approaches, and provide a hybridized EMT-parametric algorithm that enjoys drastically improved performance. Our findings are validated using 206 datasets from the OpenML repository in both bounded and infinite memory budget situations.
arXiv Detail & Related papers (2022-10-25T14:57:41Z)
A Memory Transformer Network for Incremental Learning [64.0410375349852]
We study class-incremental learning, a training setup in which new classes of data are observed over time for the model to learn from. Despite the straightforward problem formulation, the naive application of classification models to class-incremental learning results in the "catastrophic forgetting" of previously seen classes. One of the most successful existing methods has been the use of a memory of exemplars, which overcomes the issue of catastrophic forgetting by saving a subset of past data into a memory bank and utilizing it to prevent forgetting when training future tasks.
arXiv Detail & Related papers (2022-10-10T08:27:28Z)
A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement. We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work. We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z)
Mondrian Forest for Data Stream Classification Under Memory Constraints [0.0]
We adapt the online Mondrian forest classification algorithm to work with memory constraints on data streams. We design five out-of-memory strategies to update Mondrian trees with new data points when the memory limit is reached. We also design trimming mechanisms to make Mondrian trees more robust to concept drifts under memory constraints.
arXiv Detail & Related papers (2022-05-12T15:35:03Z)
Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore. We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z)
SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory. We propose StreaMRAK - a streaming version of KRR. We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z)
An Adaptive Framework for Learning Unsupervised Depth Completion [59.17364202590475]
We present a method to infer a dense depth map from a color image and associated sparse depth measurements. We show that regularization and co-visibility are related via the fitness of the model to data and can be unified into a single framework.
arXiv Detail & Related papers (2021-06-06T02:27:55Z)
Semantically Constrained Memory Allocation (SCMA) for Embedding in Efficient Recommendation Systems [27.419109620575313]
A key challenge for deep learning models is to work with millions of categorical classes or tokens. We propose a novel formulation of memory shared embedding, where memory is shared in proportion to the overlap in semantic information. We demonstrate a significant reduction in the memory footprint while maintaining performance.
arXiv Detail & Related papers (2021-02-24T19:55:49Z)
Memory-Efficient Sampling for Minimax Distance Measures [4.873362301533825]
In this paper, we investigate efficient sampling schemes in order to reduce the memory requirement and provide a linear space complexity. We evaluate the methods on real-world datasets from different domains and analyze the results.
arXiv Detail & Related papers (2020-05-26T11:00:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.