Related papers: PreNeT: Leveraging Computational Features to Predict Deep Neural Network Training Time

PreNeT: Leveraging Computational Features to Predict Deep Neural Network Training Time

URL: http://arxiv.org/abs/2412.15519v2
Date: Fri, 27 Dec 2024 02:48:04 GMT
Title: PreNeT: Leveraging Computational Features to Predict Deep Neural Network Training Time
Authors: Alireza Pourali, Arian Boukani, Hamzeh Khazaei,
Abstract summary: This paper introduces PreNeT, a novel predictive framework designed to address this optimization challenge.<n>A key feature of PreNeT is its capacity to accurately predict training duration on previously unexamined hardware infrastructures.<n> Experimental results demonstrate that PreNeT achieves up to 72% improvement in prediction accuracy compared to contemporary state-of-the-art frameworks.
Score: 2.3622884172290255
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Training deep learning models, particularly Transformer-based architectures such as Large Language Models (LLMs), demands substantial computational resources and extended training periods. While optimal configuration and infrastructure selection can significantly reduce associated costs, this optimization requires preliminary analysis tools. This paper introduces PreNeT, a novel predictive framework designed to address this optimization challenge. PreNeT facilitates training optimization by integrating comprehensive computational metrics, including layer-specific parameters, arithmetic operations and memory utilization. A key feature of PreNeT is its capacity to accurately predict training duration on previously unexamined hardware infrastructures, including novel accelerator architectures. This framework employs a sophisticated approach to capture and analyze the distinct characteristics of various neural network layers, thereby enhancing existing prediction methodologies. Through proactive implementation of PreNeT, researchers and practitioners can determine optimal configurations, parameter settings, and hardware specifications to maximize cost-efficiency and minimize training duration. Experimental results demonstrate that PreNeT achieves up to 72% improvement in prediction accuracy compared to contemporary state-of-the-art frameworks.

Related papers

Intelligent4DSE: Optimizing High-Level Synthesis Design Space Exploration with Graph Neural Networks and Large Language Models [3.8429489584622156]
We propose CoGNNs-LLMEA, a framework that integrates a graph neural network with task-adaptive message passing and a large language model-enhanced evolutionary algorithm. As a predictive model, CoGNNs directly leverages intermediate representations generated from source code after compiler front-end processing, enabling prediction of quality of results (QoR) without invoking HLS tools. CoGNNs achieves state-of-the-art prediction accuracy in post-HLS QoR prediction, reducing mean prediction errors by 2.8$times$ for latency and 3.4$times$ for resource utilization compared to baseline models
arXiv Detail & Related papers (2025-04-28T10:08:56Z)
Pre-training Graph Neural Networks with Structural Fingerprints for Materials Discovery [1.187456026346823]
We propose a novel pre-training objective which uses cheaply-computed structural fingerprints as targets. Our experiments show this approach can act as a general strategy for pre-training GNNs with application towards large scale foundational models for atomistic data.
arXiv Detail & Related papers (2025-03-03T06:50:23Z)
Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models [68.23649978697027]
Forecast-PEFT is a fine-tuning strategy that freezes the majority of the model's parameters, focusing adjustments on newly introduced prompts and adapters. Our experiments show that Forecast-PEFT outperforms traditional full fine-tuning methods in motion prediction tasks. Forecast-FT further improves prediction performance, evidencing up to a 9.6% enhancement over conventional baseline methods.
arXiv Detail & Related papers (2024-07-28T19:18:59Z)
Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives. We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis. We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z)
Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network. We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint. Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z)
FreeREA: Training-Free Evolution-based Architecture Search [17.202375422110553]
FreeREA is a custom cell-based evolution NAS algorithm that exploits an optimised combination of training-free metrics to rank architectures. Our experiments, carried out on the common benchmarks NAS-Bench-101 and NATS-Bench, demonstrate that i) FreeREA is a fast, efficient, and effective search method for models automatic design.
arXiv Detail & Related papers (2022-06-17T11:16:28Z)
A Graph Deep Learning Framework for High-Level Synthesis Design Space Exploration [11.154086943903696]
High-Level Synthesis is a solution for fast prototyping application-specific hardware. We propose HLS, for the first time in the literature, graph neural networks that jointly predict acceleration performance and hardware costs. We show that our approach achieves prediction accuracy comparable with that of commonly used simulators.
arXiv Detail & Related papers (2021-11-29T18:17:45Z)
DEBOSH: Deep Bayesian Shape Optimization [48.80431740983095]
We propose a novel uncertainty-based method tailored to shape optimization. It enables effective BO and increases the quality of the resulting shapes beyond that of state-of-the-art approaches.
arXiv Detail & Related papers (2021-09-28T11:01:42Z)
RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform Successive Halving [74.61723678821049]
We propose NOn-uniform Successive Halving (NOSH), a hierarchical scheduling algorithm that terminates the training of underperforming architectures early to avoid wasting budget. We formulate predictor-based architecture search as learning to rank with pairwise comparisons. The resulting method - RANK-NOSH, reduces the search budget by 5x while achieving competitive or even better performance than previous state-of-the-art predictor-based methods on various spaces and datasets.
arXiv Detail & Related papers (2021-08-18T07:45:21Z)
How Powerful are Performance Predictors in Neural Architecture Search? [43.86743225322636]
We give the first large-scale study of performance predictors by analyzing 31 techniques. We show that certain families of predictors can be combined to achieve even better predictive power.
arXiv Detail & Related papers (2021-04-02T17:57:16Z)
Genetically Optimized Prediction of Remaining Useful Life [4.115847582689283]
We implement LSTM and GRU models and compare the obtained results with a proposed genetically trained neural network. We hope to improve the consistency of the predictions by adding another layer of optimization using Genetic Algorithms. These models and the proposed architecture are tested on the NASA Turbofan Jet Engine dataset.
arXiv Detail & Related papers (2021-02-17T16:09:23Z)
An AI-Assisted Design Method for Topology Optimization Without Pre-Optimized Training Data [68.8204255655161]
An AI-assisted design method based on topology optimization is presented, which is able to obtain optimized designs in a direct way. Designs are provided by an artificial neural network, the predictor, on the basis of boundary conditions and degree of filling as input data.
arXiv Detail & Related papers (2020-12-11T14:33:27Z)
FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining [65.39532971991778]
We present an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking. We run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints. FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors.
arXiv Detail & Related papers (2020-06-03T05:20:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.