Related papers: Enhancing Deployment-Time Predictive Model Robustness for Code Analysis and Optimization

Enhancing Deployment-Time Predictive Model Robustness for Code Analysis and Optimization

URL: http://arxiv.org/abs/2501.00298v1
Date: Tue, 31 Dec 2024 06:17:03 GMT
Title: Enhancing Deployment-Time Predictive Model Robustness for Code Analysis and Optimization
Authors: Huanting Wang, Patrick Lenihan, Zheng Wang,
Abstract summary: We introduce Prom, an open-source library to enhance the robustness and performance of predictive models.<n>Prom achieves this by using statistical assessments to identify test samples prone to mispredictions.<n>Our evaluation demonstrates that Prom can successfully identify an average of 96% (up to 100%) of mispredictions.
Score: 4.374023944113174
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Supervised machine learning techniques have shown promising results in code analysis and optimization problems. However, a learning-based solution can be brittle because minor changes in hardware or application workloads -- such as facing a new CPU architecture or code pattern -- may jeopardize decision accuracy, ultimately undermining model robustness. We introduce Prom, an open-source library to enhance the robustness and performance of predictive models against such changes during deployment. Prom achieves this by using statistical assessments to identify test samples prone to mispredictions and using feedback on these samples to improve a deployed model. We showcase Prom by applying it to 13 representative machine learning models across 5 code analysis and optimization tasks. Our extensive evaluation demonstrates that Prom can successfully identify an average of 96% (up to 100%) of mispredictions. By relabeling up to 5% of the Prom-identified samples through incremental learning, Prom can help a deployed model achieve a performance comparable to that attained during its model training phase.

Related papers

StepFun-Prover Preview: Let's Think and Verify Step by Step [14.896796588073725]
We present StepFun-Prover Preview, a large language model designed for formal theorem proving through tool-integrated reasoning.<n>Our approach enables the model to emulate human-like problem-solving strategies by iteratively refining proofs based on real-time environment feedback.<n>On the miniF2F-test benchmark, StepFun-Prover achieves a pass@1 success rate of $70.0%$.
arXiv Detail & Related papers (2025-07-27T09:38:32Z)
Fake Runs, Real Fixes -- Analyzing xPU Performance Through Simulation [4.573673188291683]
We present xPU-Shark, a fine-grained methodology for analyzing ML models at the machine-code level. xPU-Shark captures traces from production deployments running on accelerators and replays them in a modified microarchitecture simulator. We optimize a common communication collective by up to 15% and reduce token generation latency by up to 4.1%.
arXiv Detail & Related papers (2025-03-18T23:15:02Z)
Enhancing Sample Selection by Cutting Mislabeled Easy Examples [62.13094877228772]
We show that mislabeled examples correctly predicted by the model early in the training process are particularly harmful to model performance. We propose Early Cutting, which employs the model's later training state to re-select the confident subset identified early in training.
arXiv Detail & Related papers (2025-02-12T09:12:45Z)
Dividable Configuration Performance Learning [4.949726352498762]
We propose a model-agnostic and sparsity-robust framework for predicting configuration performance, dubbed DaL. DaL is based on the new paradigm of dividable learning that builds a model via "divide-and-learn"
arXiv Detail & Related papers (2024-09-11T21:23:23Z)
Towards Stable Machine Learning Model Retraining via Slowly Varying Sequences [6.067007470552307]
We propose a methodology for finding sequences of machine learning models that are stable across retraining iterations. We develop a mixed-integer optimization formulation that is guaranteed to recover optimal models. Our method shows stronger stability than greedily trained models with a small, controllable sacrifice in predictive power.
arXiv Detail & Related papers (2024-03-28T22:45:38Z)
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models [102.72940700598055]
In reasoning tasks, even a minor error can cascade into inaccurate results. We develop a method that avoids introducing external resources, relying instead on perturbations to the input. Our training approach randomly masks certain tokens within the chain of thought, a technique we found to be particularly effective for reasoning tasks.
arXiv Detail & Related papers (2024-03-04T16:21:54Z)
Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits! [51.668411293817464]
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines. Academic research is often restrained to public datasets on the order of ten thousand samples. We devise an approach to generate a benchmark of difficulty from a pool of available samples.
arXiv Detail & Related papers (2023-12-25T21:25:55Z)
A positive feedback method based on F-measure value for Salient Object Detection [1.9249287163937976]
This paper proposes a positive feedback method based on F-measure value for salient object detection (SOD) Our proposed method takes an image to be detected and inputs it into several existing models to obtain their respective prediction maps. Experimental results on five publicly available datasets show that our proposed positive feedback method outperforms the latest 12 methods in five evaluation metrics for saliency map prediction.
arXiv Detail & Related papers (2023-04-28T04:05:13Z)
Statistical Hardware Design With Multi-model Active Learning [1.7596501992526474]
We propose a model-based active learning approach to solve the problem of designing efficient hardware. Our proposed method provides hardware models that are sufficiently accurate to perform design space exploration as well as performance prediction simultaneously.
arXiv Detail & Related papers (2023-03-14T16:37:38Z)
MEMO: Test Time Robustness via Adaptation and Augmentation [131.28104376280197]
We study the problem of test time robustification, i.e., using the test input to improve model robustness. Recent prior works have proposed methods for test time adaptation, however, they each introduce additional assumptions. We propose a simple approach that can be used in any test setting where the model is probabilistic and adaptable.
arXiv Detail & Related papers (2021-10-18T17:55:11Z)
Towards More Fine-grained and Reliable NLP Performance Prediction [85.78131503006193]
We make two contributions to improving performance prediction for NLP tasks. First, we examine performance predictors for holistic measures of accuracy like F1 or BLEU. Second, we propose methods to understand the reliability of a performance prediction model from two angles: confidence intervals and calibration.
arXiv Detail & Related papers (2021-02-10T15:23:20Z)
Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms. We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance. We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.