Methodology to Create Analysis-Naive Holdout Records as well as Train
and Test Records for Machine Learning Analyses in Healthcare
- URL: http://arxiv.org/abs/2205.03987v1
- Date: Mon, 9 May 2022 00:51:08 GMT
- Title: Methodology to Create Analysis-Naive Holdout Records as well as Train
and Test Records for Machine Learning Analyses in Healthcare
- Authors: Michele Bennett, Mehdi Nekouei, Armand Prieditis Rajesh Mehta, Ewa
Kleczyk, Karin Hayes
- Abstract summary: The purpose of the holdout sample is to preserve data for research studies that will be analysis-naive and randomly selected from the full dataset.
The methodology suggested for creating holdouts is a modification of k-fold cross validation, which takes into account randomization and efficiently allows a three-way split.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is common for researchers to holdout data from a study pool to be used for
external validation as well as for future research, and the same desire is true
to those using machine learning modeling research. For this discussion, the
purpose of the holdout sample it is preserve data for research studies that
will be analysis-naive and randomly selected from the full dataset.
Analysis-naive are records that are not used for testing or training machine
learning (ML) models and records that do not participate in any aspect of the
current machine learning study. The methodology suggested for creating holdouts
is a modification of k-fold cross validation, which takes into account
randomization and efficiently allows a three-way split (holdout, test and
training) as part of the method without forcing. The paper also provides a
working example using set of automated functions in Python and some scenarios
for applicability in healthcare.
Related papers
- Unlocking Unlabeled Data: Ensemble Learning with the Hui- Walter
Paradigm for Performance Estimation in Online and Static Settings [0.0]
We adapt the Hui-Walter paradigm, a method traditionally applied in epidemiology and medicine, to the field of machine learning.
We estimate key performance metrics such as false positive rate, false negative rate, and priors in scenarios where no ground truth is available.
arXiv Detail & Related papers (2024-01-17T17:46:10Z) - Machine Unlearning for Causal Inference [0.6621714555125157]
It is important to enable the model to forget some of its learning/captured information about a given user (machine unlearning)
This paper introduces the concept of machine unlearning for causal inference, particularly propensity score matching and treatment effect estimation.
The dataset used in the study is the Lalonde dataset, a widely used dataset for evaluating the effectiveness of job training programs.
arXiv Detail & Related papers (2023-08-24T17:27:01Z) - A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts [143.14128737978342]
Test-time adaptation, an emerging paradigm, has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions.
Recent progress in this paradigm highlights the significant benefits of utilizing unlabeled data for training self-adapted models prior to inference.
arXiv Detail & Related papers (2023-03-27T16:32:21Z) - Example-Based Explainable AI and its Application for Remote Sensing
Image Classification [0.0]
We show an example of an instance in a training dataset that is similar to the input data to be inferred.
Using a remote sensing image dataset from the Sentinel-2 satellite, the concept was successfully demonstrated.
arXiv Detail & Related papers (2023-02-03T03:48:43Z) - ALBench: A Framework for Evaluating Active Learning in Object Detection [102.81795062493536]
This paper contributes an active learning benchmark framework named as ALBench for evaluating active learning in object detection.
Developed on an automatic deep model training system, this ALBench framework is easy-to-use, compatible with different active learning algorithms, and ensures the same training and testing protocols.
arXiv Detail & Related papers (2022-07-27T07:46:23Z) - Learning to Generalize across Domains on Single Test Samples [126.9447368941314]
We learn to generalize across domains on single test samples.
We formulate the adaptation to the single test sample as a variational Bayesian inference problem.
Our model achieves at least comparable -- and often better -- performance than state-of-the-art methods on multiple benchmarks for domain generalization.
arXiv Detail & Related papers (2022-02-16T13:21:04Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Systematic Training and Testing for Machine Learning Using Combinatorial
Interaction Testing [0.0]
This paper demonstrates the systematic use of coverage for selecting and characterizing test and training sets for machine learning models.
The paper addresses prior criticism of coverage and provides a rebuttal which advocates the use of coverage metrics in machine learning applications.
arXiv Detail & Related papers (2022-01-28T21:33:31Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - pyBKT: An Accessible Python Library of Bayesian Knowledge Tracing Models [0.0]
We introduce pyBKT, a library of model extensions for knowledge tracing.
The library provides data generation, fitting, prediction, and cross-validation routines.
pyBKT is open source and open license for the purpose of making knowledge tracing more accessible to communities of research and practice.
arXiv Detail & Related papers (2021-05-02T03:08:53Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.