Methodology to Create Analysis-Naive Holdout Records as well as Train
and Test Records for Machine Learning Analyses in Healthcare
- URL: http://arxiv.org/abs/2205.03987v1
- Date: Mon, 9 May 2022 00:51:08 GMT
- Title: Methodology to Create Analysis-Naive Holdout Records as well as Train
and Test Records for Machine Learning Analyses in Healthcare
- Authors: Michele Bennett, Mehdi Nekouei, Armand Prieditis Rajesh Mehta, Ewa
Kleczyk, Karin Hayes
- Abstract summary: The purpose of the holdout sample is to preserve data for research studies that will be analysis-naive and randomly selected from the full dataset.
The methodology suggested for creating holdouts is a modification of k-fold cross validation, which takes into account randomization and efficiently allows a three-way split.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is common for researchers to holdout data from a study pool to be used for
external validation as well as for future research, and the same desire is true
to those using machine learning modeling research. For this discussion, the
purpose of the holdout sample it is preserve data for research studies that
will be analysis-naive and randomly selected from the full dataset.
Analysis-naive are records that are not used for testing or training machine
learning (ML) models and records that do not participate in any aspect of the
current machine learning study. The methodology suggested for creating holdouts
is a modification of k-fold cross validation, which takes into account
randomization and efficiently allows a three-way split (holdout, test and
training) as part of the method without forcing. The paper also provides a
working example using set of automated functions in Python and some scenarios
for applicability in healthcare.
Related papers
- regAL: Python Package for Active Learning of Regression Problems [0.0]
Python package regAL allows users to evaluate different active learning strategies for regression problems.
We present our Python package regAL, which allows users to evaluate different active learning strategies for regression problems.
arXiv Detail & Related papers (2024-10-23T14:34:36Z) - BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping [64.8477128397529]
We propose a training-required and training-free test-time adaptation framework.
We maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples.
We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets.
arXiv Detail & Related papers (2024-10-20T15:58:43Z) - Online Performance Estimation with Unlabeled Data: A Bayesian Application of the Hui-Walter Paradigm [0.0]
We adapt the Hui-Walter paradigm, a method traditionally applied in epidemiology and medicine, to the field of machine learning.
We estimate key performance metrics such as false positive rate, false negative rate, and priors in scenarios where no ground truth is available.
We extend this paradigm for handling online data, opening up new possibilities for dynamic data environments.
arXiv Detail & Related papers (2024-01-17T17:46:10Z) - Machine Unlearning for Causal Inference [0.6621714555125157]
It is important to enable the model to forget some of its learning/captured information about a given user (machine unlearning)
This paper introduces the concept of machine unlearning for causal inference, particularly propensity score matching and treatment effect estimation.
The dataset used in the study is the Lalonde dataset, a widely used dataset for evaluating the effectiveness of job training programs.
arXiv Detail & Related papers (2023-08-24T17:27:01Z) - A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts [143.14128737978342]
Test-time adaptation, an emerging paradigm, has the potential to adapt a pre-trained model to unlabeled data during testing, before making predictions.
Recent progress in this paradigm highlights the significant benefits of utilizing unlabeled data for training self-adapted models prior to inference.
arXiv Detail & Related papers (2023-03-27T16:32:21Z) - Example-Based Explainable AI and its Application for Remote Sensing
Image Classification [0.0]
We show an example of an instance in a training dataset that is similar to the input data to be inferred.
Using a remote sensing image dataset from the Sentinel-2 satellite, the concept was successfully demonstrated.
arXiv Detail & Related papers (2023-02-03T03:48:43Z) - ALBench: A Framework for Evaluating Active Learning in Object Detection [102.81795062493536]
This paper contributes an active learning benchmark framework named as ALBench for evaluating active learning in object detection.
Developed on an automatic deep model training system, this ALBench framework is easy-to-use, compatible with different active learning algorithms, and ensures the same training and testing protocols.
arXiv Detail & Related papers (2022-07-27T07:46:23Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Systematic Training and Testing for Machine Learning Using Combinatorial
Interaction Testing [0.0]
This paper demonstrates the systematic use of coverage for selecting and characterizing test and training sets for machine learning models.
The paper addresses prior criticism of coverage and provides a rebuttal which advocates the use of coverage metrics in machine learning applications.
arXiv Detail & Related papers (2022-01-28T21:33:31Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.