A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data
- URL: http://arxiv.org/abs/2407.02112v1
- Date: Tue, 2 Jul 2024 09:54:39 GMT
- Title: A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data
- Authors: Andrej Tschalzev, Sascha Marton, Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt,
- Abstract summary: This paper demonstrates that model-centric evaluations are biased, as real-world modeling pipelines often require dataset-specific preprocessing and feature engineering.
We select 10 relevant datasets from Kaggle competitions and implement expert-level preprocessing pipelines for each dataset.
After dataset-specific feature engineering, model rankings change considerably, performance differences decrease, and the importance of model selection reduces.
- Score: 9.57464542357693
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tabular data is prevalent in real-world machine learning applications, and new models for supervised learning of tabular data are frequently proposed. Comparative studies assessing the performance of models typically consist of model-centric evaluation setups with overly standardized data preprocessing. This paper demonstrates that such model-centric evaluations are biased, as real-world modeling pipelines often require dataset-specific preprocessing and feature engineering. Therefore, we propose a data-centric evaluation framework. We select 10 relevant datasets from Kaggle competitions and implement expert-level preprocessing pipelines for each dataset. We conduct experiments with different preprocessing pipelines and hyperparameter optimization (HPO) regimes to quantify the impact of model selection, HPO, feature engineering, and test-time adaptation. Our main findings are: 1. After dataset-specific feature engineering, model rankings change considerably, performance differences decrease, and the importance of model selection reduces. 2. Recent models, despite their measurable progress, still significantly benefit from manual feature engineering. This holds true for both tree-based models and neural networks. 3. While tabular data is typically considered static, samples are often collected over time, and adapting to distribution shifts can be important even in supposedly static data. These insights suggest that research efforts should be directed toward a data-centric perspective, acknowledging that tabular data requires feature engineering and often exhibits temporal characteristics.
Related papers
- Unlearning Information Bottleneck: Machine Unlearning of Systematic Patterns and Biases [6.936871609178494]
We present Unlearning Information Bottleneck (UIB), a novel information-theoretic framework designed to enhance the process of machine unlearning.
By proposing a variational upper bound, we recalibrate the model parameters through a dynamic prior that integrates changes in data distribution with an affordable computational cost.
Our experiments across various datasets, models, and unlearning methods demonstrate that our approach effectively removes systematic patterns and biases while maintaining the performance of models post-unlearning.
arXiv Detail & Related papers (2024-05-22T21:54:05Z) - Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages.
Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z) - Federated Learning with Projected Trajectory Regularization [65.6266768678291]
Federated learning enables joint training of machine learning models from distributed clients without sharing their local data.
One key challenge in federated learning is to handle non-identically distributed data across the clients.
We propose a novel federated learning framework with projected trajectory regularization (FedPTR) for tackling the data issue.
arXiv Detail & Related papers (2023-12-22T02:12:08Z) - Inductive biases in deep learning models for weather prediction [17.061163980363492]
We review and analyse the inductive biases of state-of-the-art deep learning-based weather prediction models.
We identify the most important inductive biases and highlight potential avenues towards more efficient and probabilistic DLWP models.
arXiv Detail & Related papers (2023-04-06T14:15:46Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - A Case for Dataset Specific Profiling [0.9023847175654603]
Data-driven science is an emerging paradigm where scientific discoveries depend on the execution of computational AI models against rich, discipline-specific datasets.
With modern machine learning frameworks, anyone can develop and execute computational models that reveal concepts hidden in the data that could enable scientific applications.
For important and widely used datasets, computing the performance of every computational model that can run against a dataset is cost prohibitive in terms of cloud resources.
arXiv Detail & Related papers (2022-08-01T18:38:05Z) - Identifying the Context Shift between Test Benchmarks and Production
Data [1.2259552039796024]
There exists a performance gap between machine learning models' accuracy on dataset benchmarks and real-world production data.
We outline two methods for identifying changes in context that lead to distribution shifts and model prediction errors.
We present two case-studies to highlight the implicit assumptions underlying applied machine learning models that tend to lead to errors.
arXiv Detail & Related papers (2022-07-03T14:54:54Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z) - The Effectiveness of Discretization in Forecasting: An Empirical Study
on Neural Time Series Models [15.281725756608981]
We investigate the effect of data input and output transformations on the predictive performance of neural forecasting architectures.
We find that binning almost always improves performance compared to using normalized real-valued inputs.
arXiv Detail & Related papers (2020-05-20T15:09:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.