Related papers: SOAK: Same/Other/All K-fold cross-validation for estimating similarity of patterns in data subsets

SOAK: Same/Other/All K-fold cross-validation for estimating similarity of patterns in data subsets

URL: http://arxiv.org/abs/2410.08643v1
Date: Fri, 11 Oct 2024 09:10:39 GMT
Title: SOAK: Same/Other/All K-fold cross-validation for estimating similarity of patterns in data subsets
Authors: Toby Dylan Hocking, Gabrielle Thibault, Cameron Scott Bodine, Paul Nelson Arellano, Alexander F Shenkin, Olivia Jasmine Lindly,
Abstract summary: We propose SOAK, Same/Other/All K-fold cross-validation, a new method which can be used to answer both questions. SOAK systematically compares models which are trained on different subsets of data, and then used for prediction on a fixed test subset, to estimate the similarity of learnable/predictable patterns in data subsets.
Score: 39.12222516332026
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In many real-world applications of machine learning, we are interested to know if it is possible to train on the data that we have gathered so far, and obtain accurate predictions on a new test data subset that is qualitatively different in some respect (time period, geographic region, etc). Another question is whether data subsets are similar enough so that it is beneficial to combine subsets during model training. We propose SOAK, Same/Other/All K-fold cross-validation, a new method which can be used to answer both questions. SOAK systematically compares models which are trained on different subsets of data, and then used for prediction on a fixed test subset, to estimate the similarity of learnable/predictable patterns in data subsets. We show results of using SOAK on six new real data sets (with geographic/temporal subsets, to check if predictions are accurate on new subsets), 3 image pair data sets (subsets are different image types, to check that we get smaller prediction error on similar images), and 11 benchmark data sets with predefined train/test splits (to check similarity of predefined splits).

Related papers

When is dataset cartography ineffective? Using training dynamics does not improve robustness against Adversarial SQuAD [0.0]
I partition SQuAD into easy-to-learn, ambiguous, and hard-to-learn subsets. I then compare the performance of models trained on these subsets to those trained on randomly selected samples of equal size. Results show that training on cartography-based subsets does not improve generalization to the SQuAD validation set or the AddSent adversarial set.
arXiv Detail & Related papers (2025-03-24T02:24:18Z)
DUPRE: Data Utility Prediction for Efficient Data Valuation [49.60564885180563]
Cooperative game theory-based data valuation, such as Data Shapley, requires evaluating the data utility and retraining the ML model for multiple data subsets. Our framework, textttDUPRE, takes an alternative yet complementary approach that reduces the cost per subset evaluation by predicting data utilities instead of evaluating them by model retraining. Specifically, given the evaluated data utilities of some data subsets, textttDUPRE fits a emphGaussian process (GP) regression model to predict the utility of every other data subset.
arXiv Detail & Related papers (2025-02-22T08:53:39Z)
PARSAC: Accelerating Robust Multi-Model Fitting with Parallel Sample Consensus [26.366299016589256]
We present a real-time method for robust estimation of multiple instances of geometric models from noisy data. A neural network segments the input data into clusters representing potential model instances. We demonstrate state-of-the-art performance on these as well as multiple established datasets, with inference times as small as five milliseconds per image.
arXiv Detail & Related papers (2024-01-26T14:54:56Z)
The SVHN Dataset Is Deceptive for Probabilistic Generative Models Due to a Distribution Mismatch [12.542073306638988]
The Street View House Numbers dataset is a popular benchmark dataset in deep learning. We warn that the official split into training set and test set of the SVHN dataset are not drawn from the same distribution. We propose to mix and re-split the official training and test set when SVHN is used for tasks other than classification.
arXiv Detail & Related papers (2023-10-30T15:38:31Z)
Revisiting Long-tailed Image Classification: Survey and Benchmarks with New Evaluation Metrics [88.39382177059747]
A corpus of metrics is designed for measuring the accuracy, robustness, and bounds of algorithms for learning with long-tailed distribution. Based on our benchmarks, we re-evaluate the performance of existing methods on CIFAR10 and CIFAR100 datasets.
arXiv Detail & Related papers (2023-02-03T02:40:54Z)
Using Mixed-Effect Models to Learn Bayesian Networks from Related Data Sets [0.04297070083645048]
We provide an analogous solution for learning a Bayesian network from continuous data using mixed-effects models. We study its structural, parametric, predictive and classification accuracy. The improvement is marked for low sample sizes and for unbalanced data sets.
arXiv Detail & Related papers (2022-06-08T08:32:32Z)
Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next. In such settings, there is a distinct type of distribution shift between the training and test data. We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z)
Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples. We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models. We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z)
Learning with Instance Bundles for Reading Comprehension [61.823444215188296]
We introduce new supervision techniques that compare question-answer scores across multiple related instances. Specifically, we normalize these scores across various neighborhoods of closely contrasting questions and/or answers. We empirically demonstrate the effectiveness of training with instance bundles on two datasets.
arXiv Detail & Related papers (2021-04-18T06:17:54Z)
A Bayesian Hierarchical Score for Structure Learning from Related Data Sets [0.7240563090941907]
We propose a new Bayesian Dirichlet score, which we call Bayesian Hierarchical Dirichlet (BHD) BHD is based on a hierarchical model that pools information across data sets to learn a single encompassing network structure. We find that BHD outperforms the Bayesian Dirichlet equivalent uniform (BDeu) score in terms of reconstruction accuracy as measured by the Structural Hamming distance.
arXiv Detail & Related papers (2020-08-04T16:41:05Z)
SimEx: Express Prediction of Inter-dataset Similarity by a Fleet of Autoencoders [13.55607978839719]
Knowing the similarity between sets of data has a number of positive implications in training an effective model. We present SimEx, a new method for early prediction of inter-dataset similarity using a set of pretrained autoencoders. Our method achieves more than 10x speed-up in predicting inter-dataset similarity compared to common similarity-estimating practices.
arXiv Detail & Related papers (2020-01-14T16:52:50Z)
Cross-dataset Training for Class Increasing Object Detection [52.34737978720484]
We present a conceptually simple, flexible and general framework for cross-dataset training in object detection. By cross-dataset training, existing datasets can be utilized to detect the merged object classes with a single model. While using cross-dataset training, we only need to label the new classes on the new dataset.
arXiv Detail & Related papers (2020-01-14T04:40:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.