Related papers: Julearn: an easy-to-use library for leakage-free evaluation and inspection of ML models

Julearn: an easy-to-use library for leakage-free evaluation and inspection of ML models

URL: http://arxiv.org/abs/2310.12568v1
Date: Thu, 19 Oct 2023 08:21:12 GMT
Title: Julearn: an easy-to-use library for leakage-free evaluation and inspection of ML models
Authors: Sami Hamdan, Shammi More, Leonard Sasse, Vera Komeyer, Kaustubh R. Patil and Federico Raimondo (for the Alzheimer's Disease Neuroimaging Initiative)
Abstract summary: We present the rationale behind julearn's design, its core features, and showcase three examples of previously-published research projects. Julearn aims to simplify the entry into the machine learning world by providing an easy-to-use environment with built in guards against some of the most common ML pitfalls.
Score: 0.23301643766310373
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The fast-paced development of machine learning (ML) methods coupled with its increasing adoption in research poses challenges for researchers without extensive training in ML. In neuroscience, for example, ML can help understand brain-behavior relationships, diagnose diseases, and develop biomarkers using various data sources like magnetic resonance imaging and electroencephalography. The primary objective of ML is to build models that can make accurate predictions on unseen data. Researchers aim to prove the existence of such generalizable models by evaluating performance using techniques such as cross-validation (CV), which uses systematic subsampling to estimate the generalization performance. Choosing a CV scheme and evaluating an ML pipeline can be challenging and, if used improperly, can lead to overestimated results and incorrect interpretations. We created julearn, an open-source Python library, that allow researchers to design and evaluate complex ML pipelines without encountering in common pitfalls. In this manuscript, we present the rationale behind julearn's design, its core features, and showcase three examples of previously-published research projects that can be easily implemented using this novel library. Julearn aims to simplify the entry into the ML world by providing an easy-to-use environment with built in guards against some of the most common ML pitfalls. With its design, unique features and simple interface, it poses as a useful Python-based library for research projects.

Related papers

xML-workFlow: an end-to-end explainable scikit-learn workflow for rapid biomedical experimentation [0.0]
Building and iterating machine learning models is often a resource-intensive process. xML-workFlow addresses this issue by providing a rapid, robust, and traceable end-to-end workflow.
arXiv Detail & Related papers (2025-04-02T05:01:12Z)
MLScent A tool for Anti-pattern detection in ML projects [5.669063174637433]
This paper introduces MLScent, a novel static analysis tool for code smell detection. MLScent implements 76 distinct detectors across major machine learning frameworks. Results show high accuracy in framework-specific anti-patterns, data handling issues, and general ML code smells.
arXiv Detail & Related papers (2025-01-30T11:19:16Z)
MLXP: A Framework for Conducting Replicable Experiments in Python [63.37350735954699]
We propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python. It streamlines the experimental process with minimal overhead while ensuring a high level of practitioner overhead.
arXiv Detail & Related papers (2024-02-21T14:22:20Z)
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows [72.40917624485822]
We introduce DataDreamer, an open source Python library that allows researchers to implement powerful large language models. DataDreamer also helps researchers adhere to best practices that we propose to encourage open science.
arXiv Detail & Related papers (2024-02-16T00:10:26Z)
Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners. We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting. Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z)
Learn to Unlearn: A Survey on Machine Unlearning [29.077334665555316]
This article presents a review of recent machine unlearning techniques, verification mechanisms, and potential attacks. We highlight emerging challenges and prospective research directions. We aim for this paper to provide valuable resources for integrating privacy, equity, andresilience into ML systems.
arXiv Detail & Related papers (2023-05-12T14:28:02Z)
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages [116.74407069443895]
We unify encoder and decoder-based models into a single prefix-LM. For learning methods, we explore the claim of a "free lunch" hypothesis. For data distributions, the effect of a mixture distribution and multi-epoch training of programming and natural languages on model performance is explored.
arXiv Detail & Related papers (2023-05-03T17:55:25Z)
The Integration of Machine Learning into Automated Test Generation: A Systematic Mapping Study [15.016047591601094]
We characterize emerging research, examining testing practices, researcher goals, ML techniques applied, evaluation, and challenges. ML generates input for system, GUI, unit, performance, and testing or improves the performance of existing generation methods.
arXiv Detail & Related papers (2022-06-21T09:26:25Z)
What Makes Good Contrastive Learning on Small-Scale Wearable-based Tasks? [59.51457877578138]
We study contrastive learning on the wearable-based activity recognition task. This paper presents an open-source PyTorch library textttCL-HAR, which can serve as a practical tool for researchers.
arXiv Detail & Related papers (2022-02-12T06:10:15Z)
pyBKT: An Accessible Python Library of Bayesian Knowledge Tracing Models [0.0]
We introduce pyBKT, a library of model extensions for knowledge tracing. The library provides data generation, fitting, prediction, and cross-validation routines. pyBKT is open source and open license for the purpose of making knowledge tracing more accessible to communities of research and practice.
arXiv Detail & Related papers (2021-05-02T03:08:53Z)
A Rigorous Machine Learning Analysis Pipeline for Biomedical Binary Classification: Application in Pancreatic Cancer Nested Case-control Studies with Implications for Bias Assessments [2.9726886415710276]
We have laid out and assembled a complete, rigorous ML analysis pipeline focused on binary classification. This 'automated' but customizable pipeline includes a) exploratory analysis, b) data cleaning and transformation, c) feature selection, d) model training with 9 established ML algorithms. We apply this pipeline to an epidemiological investigation of established and newly identified risk factors for cancer to evaluate how different sources of bias might be handled by ML algorithms.
arXiv Detail & Related papers (2020-08-28T19:58:05Z)
Machine Learning Pipelines: Provenance, Reproducibility and FAIR Data Principles [0.0]
We describe our goals and initial steps in supporting the end-to-end of machine learning pipelines. We investigate which factors beyond the availability of source code and datasets influence the influence of ML experiments. We propose ways to apply FAIR data practices to ML experiments.
arXiv Detail & Related papers (2020-06-22T10:17:34Z)
Bayesian active learning for production, a systematic study and a reusable library [85.32971950095742]
In this paper, we analyse the main drawbacks of current active learning techniques. We do a systematic study on the effects of the most common issues of real-world datasets on the deep active learning process. We derive two techniques that can speed up the active learning loop such as partial uncertainty sampling and larger query size.
arXiv Detail & Related papers (2020-06-17T14:51:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.