Simplifying Hyperparameter Tuning in Online Machine Learning -- The
spotRiverGUI
- URL: http://arxiv.org/abs/2402.11594v1
- Date: Sun, 18 Feb 2024 14:12:15 GMT
- Title: Simplifying Hyperparameter Tuning in Online Machine Learning -- The
spotRiverGUI
- Authors: Thomas Bartz-Beielstein
- Abstract summary: Online Machine Learning (OML) is an alternative to Batch Machine Learning (BML)
OML is able to process data in a sequential manner, which is especially useful for data streams.
- Score: 0.5439020425819
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Batch Machine Learning (BML) reaches its limits when dealing with very large
amounts of streaming data. This is especially true for available memory,
handling drift in data streams, and processing new, unknown data. Online
Machine Learning (OML) is an alternative to BML that overcomes the limitations
of BML. OML is able to process data in a sequential manner, which is especially
useful for data streams. The `river` package is a Python OML-library, which
provides a variety of online learning algorithms for classification,
regression, clustering, anomaly detection, and more. The `spotRiver` package
provides a framework for hyperparameter tuning of OML models. The
`spotRiverGUI` is a graphical user interface for the `spotRiver` package. The
`spotRiverGUI` releases the user from the burden of manually searching for the
optimal hyperparameter setting. After the data is provided, users can compare
different OML algorithms from the powerful `river` package in a convenient way
and tune the selected algorithms very efficiently.
Related papers
- Practical programming research of Linear DML model based on the simplest Python code: From the standpoint of novice researchers [0.0]
This paper presents linear DML models for causal inference using the simplest Python code on a Jupyter notebook based on an Anaconda platform.
The results show that current Library API technology is not yet sufficient to enable novice Python users to build qualified and high-quality DML models.
arXiv Detail & Related papers (2025-02-22T10:07:54Z) - Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages [10.418542753869433]
Low-resource languages (LRLs) face significant challenges in natural language processing (NLP) due to limited data.
Current state-of-the-art large language models (LLMs) still struggle with LRLs.
Small multilingual models (mLMs) such as mBERT and XLM-R offer greater promise due to a better fit of their capacity to low training data sizes.
arXiv Detail & Related papers (2025-02-14T13:10:39Z) - LML-DAP: Language Model Learning a Dataset for Data-Augmented Prediction [0.0]
This paper introduces a new approach for classification tasks using Large Language Models (LLMs) in an explainable method.
The classification is performed by LLMs using a method similar to that used by humans who manually explore and understand the data to decide classifications.
The system scored an accuracy above 90% in some test cases, confirming the effectiveness and potential of the system to outperform Machine Learning models in various scenarios.
arXiv Detail & Related papers (2024-09-27T17:58:50Z) - forester: A Tree-Based AutoML Tool in R [0.0]
The forester is an open-source AutoML package implemented in R for training high-quality tree-based models.
It fully supports binary and multiclass classification, regression, and partially survival analysis tasks.
With just a few functions, the user is capable of detecting issues regarding the data quality, preparing the preprocessing pipeline, training and tuning tree-based models, evaluating the results, and creating the report for further analysis.
arXiv Detail & Related papers (2024-09-07T10:39:10Z) - Online Adaptation of Language Models with a Memory of Amortized Contexts [82.02369596879817]
Memory of Amortized Contexts (MAC) is an efficient and effective online adaptation framework for large language models.
We show how MAC can be combined with and improve the performance of popular alternatives such as retrieval augmented generations.
arXiv Detail & Related papers (2024-03-07T08:34:57Z) - Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control [66.78146440275093]
Learned retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors.
We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval.
Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets.
Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
arXiv Detail & Related papers (2024-02-27T14:21:56Z) - Scaling Sparse Fine-Tuning to Large Language Models [67.59697720719672]
Large Language Models (LLMs) are difficult to fully fine-tune due to their sheer number of parameters.
We propose SpIEL, a novel sparse finetuning method which maintains an array of parameter indices and the deltas of these parameters relative to their pretrained values.
We show that SpIEL is superior to popular parameter-efficient fine-tuning methods like LoRA in terms of performance and comparable in terms of run time.
arXiv Detail & Related papers (2024-01-29T18:43:49Z) - NumS: Scalable Array Programming for the Cloud [82.827921577004]
We present NumS, an array programming library which optimize NumPy-like expressions on task-based distributed systems.
This is achieved through a novel scheduler called Load Simulated Hierarchical Scheduling (LSHS)
We show that LSHS enhances performance on Ray by decreasing network load by a factor of 2x, requiring 4x less memory, and reducing execution time by 10x on the logistic regression problem.
arXiv Detail & Related papers (2022-06-28T20:13:40Z) - STREAMLINE: A Simple, Transparent, End-To-End Automated Machine Learning
Pipeline Facilitating Data Analysis and Algorithm Comparison [0.49034553215430216]
STREAMLINE is a simple, transparent, end-to-end AutoML pipeline.
It is specifically designed to compare performance between datasets, ML algorithms, and other AutoML tools.
arXiv Detail & Related papers (2022-06-23T22:40:58Z) - SubStrat: A Subset-Based Strategy for Faster AutoML [5.833272638548153]
SubStrat is an AutoML optimization strategy that tackles the data size, rather than configuration space.
It wraps existing AutoML tools, and instead of executing them directly on the entire dataset, SubStrat uses a genetic-based algorithm to find a small subset.
It then employs the AutoML tool on the small subset, and finally, it refines the resulted pipeline by executing a restricted, much shorter, AutoML process on the large dataset.
arXiv Detail & Related papers (2022-06-07T07:44:06Z) - Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and
Personalized Federated Learning [56.17603785248675]
Model-agnostic meta-learning (MAML) has become a popular research area.
Existing MAML algorithms rely on the episode' idea by sampling a few tasks and data points to update the meta-model at each iteration.
This paper proposes memory-based algorithms for MAML that converge with vanishing error.
arXiv Detail & Related papers (2021-06-09T08:47:58Z) - Picasso: A Sparse Learning Library for High Dimensional Data Analysis in
R and Python [77.33905890197269]
We describe a new library which implements a unified pathwise coordinate optimization for a variety of sparse learning problems.
The library is coded in R++ and has user-friendly sparse experiments.
arXiv Detail & Related papers (2020-06-27T02:39:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.