Related papers: mlf-core: a framework for deterministic machine learning

mlf-core: a framework for deterministic machine learning

URL: http://arxiv.org/abs/2104.07651v1
Date: Thu, 15 Apr 2021 17:58:03 GMT
Title: mlf-core: a framework for deterministic machine learning
Authors: Lukas Heumos, Philipp Ehmele, Kevin Menden, Luis Kuhn Cuellar, Edmund Miller, Steffen Lemke, Gisela Gabernet and Sven Nahnsen
Abstract summary: Major machine learning libraries default to the usage of non-deterministic algorithms based on atomic operations. To overcome this shortcoming, various machine learning libraries released deterministic counterparts to the non-deterministic algorithms. We developed a new software solution, the mlf-core, which aids machine learning projects to meet and keep these requirements.
Score: 0.08795040582681389
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Machine learning has shown extensive growth in recent years. However, previously existing studies highlighted a reproducibility crisis in machine learning. The reasons for irreproducibility are manifold. Major machine learning libraries default to the usage of non-deterministic algorithms based on atomic operations. Solely fixing all random seeds is not sufficient for deterministic machine learning. To overcome this shortcoming, various machine learning libraries released deterministic counterparts to the non-deterministic algorithms. We evaluated the effect of these algorithms on determinism and runtime. Based on these results, we formulated a set of requirements for reproducible machine learning and developed a new software solution, the mlf-core ecosystem, which aids machine learning projects to meet and keep these requirements. We applied mlf-core to develop fully reproducible models in various biomedical fields including a single cell autoencoder with TensorFlow, a PyTorch-based U-Net model for liver-tumor segmentation in CT scans, and a liver cancer classifier based on gene expression profiles with XGBoost.

Related papers

NNTile: a machine learning framework capable of training extremely large GPT language models on a single node [83.9328245724548]
NNTile is based on a StarPU library, which implements task-based parallelism and schedules all provided tasks onto all available processing units. It means that a particular operation, necessary to train a large neural network, can be performed on any of the CPU cores or GPU devices.
arXiv Detail & Related papers (2025-04-17T16:22:32Z)
ProtGO: A Transformer based Fusion Model for accurately predicting Gene Ontology (GO) Terms from full scale Protein Sequences [0.11049608786515838]
We propose a transformer-based fusion model capable of predicting Gene Ontology terms from full-scale protein sequences. The model is able to understand both short and long term dependencies within the enzyme's structure and can precisely identify the motifs associated with the various GO terms.
arXiv Detail & Related papers (2024-12-08T02:09:45Z)
The Impact of Ontology on the Prediction of Cardiovascular Disease Compared to Machine Learning Algorithms [0.0]
This paper compares and reviews the most prominent machine learning algorithms, as well as the ontology-based Machine Learning classification. The findings are assessed using performance measures generated from the confusion matrix, such as F-Measure, Accuracy, Recall, and Precision.
arXiv Detail & Related papers (2024-05-30T18:40:27Z)
Gradients of Functions of Large Matrices [18.361820028457718]
We show how to differentiate workhorses of numerical linear algebra efficiently. We derive previously unknown adjoint systems for Lanczos and Arnoldi iterations, implement them in JAX, and show that the resulting code can compete with Diffrax. All this is achieved without any problem-specific code optimisation.
arXiv Detail & Related papers (2024-05-27T15:39:45Z)
Using Machine Learning To Identify Software Weaknesses From Software Requirement Specifications [49.1574468325115]
This research focuses on finding an efficient machine learning algorithm to identify software weaknesses from requirement specifications. Keywords extracted using latent semantic analysis help map the CWE categories to PROMISE_exp. Naive Bayes, support vector machine (SVM), decision trees, neural network, and convolutional neural network (CNN) algorithms were tested.
arXiv Detail & Related papers (2023-08-10T13:19:10Z)
Automating In-Network Machine Learning [2.857025628729502]
Planter is an open-source framework for mapping trained machine learning models to programmable devices. We show that Planter-based in-network machine learning algorithms can run at line rate, have a negligible effect on latency, coexist with standard switching functionality, and have no or minor accuracy trade-offs.
arXiv Detail & Related papers (2022-05-18T09:42:22Z)
MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data. MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism. We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z)
Ten Quick Tips for Deep Learning in Biology [116.78436313026478]
Machine learning is concerned with the development and applications of algorithms that can recognize patterns in data and use them for predictive modeling. Deep learning has become its own subfield of machine learning. In the context of biological research, deep learning has been increasingly used to derive novel insights from high-dimensional biological data.
arXiv Detail & Related papers (2021-05-29T21:02:44Z)
Reservoir Stack Machines [77.12475691708838]
Memory-augmented neural networks equip a recurrent neural network with an explicit memory to support tasks that require information storage. We introduce the reservoir stack machine, a model which can provably recognize all deterministic context-free languages. Our results show that the reservoir stack machine achieves zero error, even on test sequences longer than the training data.
arXiv Detail & Related papers (2021-05-04T16:50:40Z)
Soft Genetic Programming Binary Classifiers [0.0]
"Soft" genetic programming (SGP) has been developed, which allows the logical operator tree to be more flexible and find dependencies in datasets. This article discusses a method for constructing binary classifiers using the SGP technique.
arXiv Detail & Related papers (2021-01-21T17:43:11Z)
Predictive Coding Approximates Backprop along Arbitrary Computation Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents. Our models perform equivalently to backprop on challenging machine learning benchmarks. Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z)
Towards automated kernel selection in machine learning systems: A SYCL case study [0.0]
We present initial results using machine learning to select kernels in a case study deploying high performance SYCL kernels in libraries. By combining auto-tuning and machine learning these kernel selection processes can be deployed with little developer effort to achieve high performance on new hardware.
arXiv Detail & Related papers (2020-03-15T11:23:36Z)
AutoML-Zero: Evolving Machine Learning Algorithms From Scratch [76.83052807776276]
We show that it is possible to automatically discover complete machine learning algorithms just using basic mathematical operations as building blocks. We demonstrate this by introducing a novel framework that significantly reduces human bias through a generic search space. We believe these preliminary successes in discovering machine learning algorithms from scratch indicate a promising new direction in the field.
arXiv Detail & Related papers (2020-03-06T19:00:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.