Mondrian Forest for Data Stream Classification Under Memory Constraints
- URL: http://arxiv.org/abs/2205.07871v3
- Date: Fri, 4 Aug 2023 12:54:36 GMT
- Title: Mondrian Forest for Data Stream Classification Under Memory Constraints
- Authors: Martin Khannouz, Tristan Glatard
- Abstract summary: We adapt the online Mondrian forest classification algorithm to work with memory constraints on data streams.
We design five out-of-memory strategies to update Mondrian trees with new data points when the memory limit is reached.
We also design trimming mechanisms to make Mondrian trees more robust to concept drifts under memory constraints.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Supervised learning algorithms generally assume the availability of enough
memory to store their data model during the training and test phases. However,
in the Internet of Things, this assumption is unrealistic when data comes in
the form of infinite data streams, or when learning algorithms are deployed on
devices with reduced amounts of memory. In this paper, we adapt the online
Mondrian forest classification algorithm to work with memory constraints on
data streams. In particular, we design five out-of-memory strategies to update
Mondrian trees with new data points when the memory limit is reached. Moreover,
we design trimming mechanisms to make Mondrian trees more robust to concept
drifts under memory constraints. We evaluate our algorithms on a variety of
real and simulated datasets, and we conclude with recommendations on their use
in different situations: the Extend Node strategy appears as the best
out-of-memory strategy in all configurations, whereas different trimming
mechanisms should be adopted depending on whether a concept drift is expected.
All our methods are implemented in the OrpailleCC open-source library and are
ready to be used on embedded systems and connected objects.
Related papers
- B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module.
B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z) - Online Adaptation of Language Models with a Memory of Amortized Contexts [82.02369596879817]
Memory of Amortized Contexts (MAC) is an efficient and effective online adaptation framework for large language models.
We show how MAC can be combined with and improve the performance of popular alternatives such as retrieval augmented generations.
arXiv Detail & Related papers (2024-03-07T08:34:57Z) - RMM: Reinforced Memory Management for Class-Incremental Learning [102.20140790771265]
Class-Incremental Learning (CIL) trains classifiers under a strict memory budget.
Existing methods use a static and ad hoc strategy for memory allocation, which is often sub-optimal.
We propose a dynamic memory management strategy that is optimized for the incremental phases and different object classes.
arXiv Detail & Related papers (2023-01-14T00:07:47Z) - Eigen Memory Tree [27.33148786536804]
This work introduces the Eigen Memory Tree (EMT), a novel online memory model for sequential learning scenarios.
We demonstrate that EMT outperforms existing online memory approaches, and provide a hybridized EMT-parametric algorithm that enjoys drastically improved performance.
Our findings are validated using 206 datasets from the OpenML repository in both bounded and infinite memory budget situations.
arXiv Detail & Related papers (2022-10-25T14:57:41Z) - Dynamic Ensemble Size Adjustment for Memory Constrained Mondrian Forest [0.0]
In this paper, we show that under memory constraints, increasing the size of a tree-based ensemble classifier can worsen its performance.
We experimentally show the existence of an optimal ensemble size for a memory-bounded Mondrian forest on data streams.
We conclude that our method can achieve up to 95% of the performance of an optimally-sized Mondrian forest for stable datasets.
arXiv Detail & Related papers (2022-10-11T18:05:58Z) - A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental
Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement.
We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work.
We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z) - Pin the Memory: Learning to Generalize Semantic Segmentation [68.367763672095]
We present a novel memory-guided domain generalization method for semantic segmentation based on meta-learning framework.
Our method abstracts the conceptual knowledge of semantic classes into categorical memory which is constant beyond the domains.
arXiv Detail & Related papers (2022-04-07T17:34:01Z) - Shrub Ensembles for Online Classification [7.057937612386993]
Decision Tree (DT) ensembles provide excellent performance while adapting to changes in the data, but they are not resource efficient.
We propose a novel memory-efficient online classification ensemble called shrub ensembles for resource-constraint systems.
Our algorithm trains small to medium-sized decision trees on small windows and uses gradient descent to learn the ensemble weights of these shrubs'
arXiv Detail & Related papers (2021-12-07T14:22:43Z) - SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory.
We propose StreaMRAK - a streaming version of KRR.
We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z) - Semantically Constrained Memory Allocation (SCMA) for Embedding in
Efficient Recommendation Systems [27.419109620575313]
A key challenge for deep learning models is to work with millions of categorical classes or tokens.
We propose a novel formulation of memory shared embedding, where memory is shared in proportion to the overlap in semantic information.
We demonstrate a significant reduction in the memory footprint while maintaining performance.
arXiv Detail & Related papers (2021-02-24T19:55:49Z) - Neural Storage: A New Paradigm of Elastic Memory [4.307341575886927]
Storage and retrieval of data in a computer memory plays a major role in system performance.
We introduce Neural Storage (NS), a brain-inspired learning memory paradigm that organizes the memory as a flexible neural memory network.
NS achieves an order of magnitude improvement in memory access performance for two representative applications.
arXiv Detail & Related papers (2021-01-07T19:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.