mlf-core: a framework for deterministic machine learning
- URL: http://arxiv.org/abs/2104.07651v1
- Date: Thu, 15 Apr 2021 17:58:03 GMT
- Title: mlf-core: a framework for deterministic machine learning
- Authors: Lukas Heumos, Philipp Ehmele, Kevin Menden, Luis Kuhn Cuellar, Edmund
Miller, Steffen Lemke, Gisela Gabernet and Sven Nahnsen
- Abstract summary: Major machine learning libraries default to the usage of non-deterministic algorithms based on atomic operations.
To overcome this shortcoming, various machine learning libraries released deterministic counterparts to the non-deterministic algorithms.
We developed a new software solution, the mlf-core, which aids machine learning projects to meet and keep these requirements.
- Score: 0.08795040582681389
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Machine learning has shown extensive growth in recent years. However,
previously existing studies highlighted a reproducibility crisis in machine
learning. The reasons for irreproducibility are manifold. Major machine
learning libraries default to the usage of non-deterministic algorithms based
on atomic operations. Solely fixing all random seeds is not sufficient for
deterministic machine learning. To overcome this shortcoming, various machine
learning libraries released deterministic counterparts to the non-deterministic
algorithms. We evaluated the effect of these algorithms on determinism and
runtime. Based on these results, we formulated a set of requirements for
reproducible machine learning and developed a new software solution, the
mlf-core ecosystem, which aids machine learning projects to meet and keep these
requirements. We applied mlf-core to develop fully reproducible models in
various biomedical fields including a single cell autoencoder with TensorFlow,
a PyTorch-based U-Net model for liver-tumor segmentation in CT scans, and a
liver cancer classifier based on gene expression profiles with XGBoost.
Related papers
- The Impact of Ontology on the Prediction of Cardiovascular Disease Compared to Machine Learning Algorithms [0.0]
This paper compares and reviews the most prominent machine learning algorithms, as well as the ontology-based Machine Learning classification.
The findings are assessed using performance measures generated from the confusion matrix, such as F-Measure, Accuracy, Recall, and Precision.
arXiv Detail & Related papers (2024-05-30T18:40:27Z) - Gradients of Functions of Large Matrices [18.361820028457718]
We show how to differentiate workhorses of numerical linear algebra efficiently.
We derive previously unknown adjoint systems for Lanczos and Arnoldi iterations, implement them in JAX, and show that the resulting code can compete with Diffrax.
All this is achieved without any problem-specific code optimisation.
arXiv Detail & Related papers (2024-05-27T15:39:45Z) - Using Machine Learning To Identify Software Weaknesses From Software
Requirement Specifications [49.1574468325115]
This research focuses on finding an efficient machine learning algorithm to identify software weaknesses from requirement specifications.
Keywords extracted using latent semantic analysis help map the CWE categories to PROMISE_exp. Naive Bayes, support vector machine (SVM), decision trees, neural network, and convolutional neural network (CNN) algorithms were tested.
arXiv Detail & Related papers (2023-08-10T13:19:10Z) - Automating In-Network Machine Learning [2.857025628729502]
Planter is an open-source framework for mapping trained machine learning models to programmable devices.
We show that Planter-based in-network machine learning algorithms can run at line rate, have a negligible effect on latency, coexist with standard switching functionality, and have no or minor accuracy trade-offs.
arXiv Detail & Related papers (2022-05-18T09:42:22Z) - MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data.
MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism.
We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z) - Ten Quick Tips for Deep Learning in Biology [116.78436313026478]
Machine learning is concerned with the development and applications of algorithms that can recognize patterns in data and use them for predictive modeling.
Deep learning has become its own subfield of machine learning.
In the context of biological research, deep learning has been increasingly used to derive novel insights from high-dimensional biological data.
arXiv Detail & Related papers (2021-05-29T21:02:44Z) - Reservoir Stack Machines [77.12475691708838]
Memory-augmented neural networks equip a recurrent neural network with an explicit memory to support tasks that require information storage.
We introduce the reservoir stack machine, a model which can provably recognize all deterministic context-free languages.
Our results show that the reservoir stack machine achieves zero error, even on test sequences longer than the training data.
arXiv Detail & Related papers (2021-05-04T16:50:40Z) - Soft Genetic Programming Binary Classifiers [0.0]
"Soft" genetic programming (SGP) has been developed, which allows the logical operator tree to be more flexible and find dependencies in datasets.
This article discusses a method for constructing binary classifiers using the SGP technique.
arXiv Detail & Related papers (2021-01-21T17:43:11Z) - Predictive Coding Approximates Backprop along Arbitrary Computation
Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents.
Our models perform equivalently to backprop on challenging machine learning benchmarks.
Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z) - Towards automated kernel selection in machine learning systems: A SYCL
case study [0.0]
We present initial results using machine learning to select kernels in a case study deploying high performance SYCL kernels in libraries.
By combining auto-tuning and machine learning these kernel selection processes can be deployed with little developer effort to achieve high performance on new hardware.
arXiv Detail & Related papers (2020-03-15T11:23:36Z) - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch [76.83052807776276]
We show that it is possible to automatically discover complete machine learning algorithms just using basic mathematical operations as building blocks.
We demonstrate this by introducing a novel framework that significantly reduces human bias through a generic search space.
We believe these preliminary successes in discovering machine learning algorithms from scratch indicate a promising new direction in the field.
arXiv Detail & Related papers (2020-03-06T19:00:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.