Related papers: Data Complexity-aware Deep Model Performance Forecasting

Data Complexity-aware Deep Model Performance Forecasting

URL: http://arxiv.org/abs/2601.01383v1
Date: Sun, 04 Jan 2026 05:31:04 GMT
Title: Data Complexity-aware Deep Model Performance Forecasting
Authors: Yen-Chia Chen, Hsing-Kuo Pao, Hanjuan Huang,
Abstract summary: We propose a lightweight, two-stage framework that can estimate model performance before training.<n>The setup allows the framework to generalize across datasets and model types.<n>We find that some of the underlying features used for prediction can offer practical guidance for model selection.
Score: 0.5735035463793009
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Deep learning models are widely used across computer vision and other domains. When working on the model induction, selecting the right architecture for a given dataset often relies on repetitive trial-and-error procedures. This procedure is time-consuming, resource-intensive, and difficult to automate. While previous work has explored performance prediction using partial training or complex simulations, these methods often require significant computational overhead or lack generalizability. In this work, we propose an alternative approach: a lightweight, two-stage framework that can estimate model performance before training given the understanding of the dataset and the focused deep model structures. The first stage predicts a baseline based on the analysis of some measurable properties of the dataset, while the second stage adjusts the estimation with additional information on the model's architectural and hyperparameter details. The setup allows the framework to generalize across datasets and model types. Moreover, we find that some of the underlying features used for prediction - such as dataset variance - can offer practical guidance for model selection, and can serve as early indicators of data quality. As a result, the framework can be used not only to forecast model performance, but also to guide architecture choices, inform necessary preprocessing procedures, and detect potentially problematic datasets before training begins.

Related papers

A Data-Centric Perspective on Evaluating Machine Learning Models for Tabular Data [9.57464542357693]
This paper demonstrates that model-centric evaluations are biased, as real-world modeling pipelines often require dataset-specific preprocessing and feature engineering.<n>We select 10 relevant datasets from Kaggle competitions and implement expert-level preprocessing pipelines for each dataset.<n>After dataset-specific feature engineering, model rankings change considerably, performance differences decrease, and the importance of model selection reduces.
arXiv Detail & Related papers (2024-07-02T09:54:39Z)
Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts.<n>Existing approaches require re-training models on different data subsets, which is computationally intensive.<n>This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z)
Generative Pretrained Hierarchical Transformer for Time Series Forecasting [3.739587363053192]
We propose a novel generative pretrained hierarchical transformer architecture for forecasting, named textbfGPHT. We conduct sufficient experiments on eight datasets with mainstream self-supervised pretraining models and supervised models. The results demonstrated that GPHT surpasses the baseline models across various fine-tuning and zero/few-shot learning settings in the traditional long-term forecasting task.
arXiv Detail & Related papers (2024-02-26T11:54:54Z)
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other. We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z)
Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain. We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size. Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z)
Revealing the Underlying Patterns: Investigating Dataset Similarity, Performance, and Generalization [0.0]
Supervised deep learning models require significant amount of labeled data to achieve an acceptable performance on a specific task. We establish image-image, dataset-dataset, and image-dataset distances to gain insights into the model's behavior.
arXiv Detail & Related papers (2023-08-07T13:35:53Z)
Where to start? Analyzing the potential value of intermediate models [16.32982010228009]
We perform a systematic analysis of the emphintertraining scheme over a wide range of English classification tasks. Surprisingly, our analysis suggests that the potential intertraining gain can be analyzed emphindependently for the target dataset. We leverage our analysis to propose a practical and efficient approach to determine if and how to select a base model in real-world settings.
arXiv Detail & Related papers (2022-10-31T19:24:02Z)
Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems. Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored. We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z)
Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage. We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z)
A Case for Dataset Specific Profiling [0.9023847175654603]
Data-driven science is an emerging paradigm where scientific discoveries depend on the execution of computational AI models against rich, discipline-specific datasets. With modern machine learning frameworks, anyone can develop and execute computational models that reveal concepts hidden in the data that could enable scientific applications. For important and widely used datasets, computing the performance of every computational model that can run against a dataset is cost prohibitive in terms of cloud resources.
arXiv Detail & Related papers (2022-08-01T18:38:05Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.