Learning to Learn to Predict Performance Regressions in Production at
Meta
- URL: http://arxiv.org/abs/2208.04351v2
- Date: Mon, 22 May 2023 08:55:27 GMT
- Title: Learning to Learn to Predict Performance Regressions in Production at
Meta
- Authors: Moritz Beller, Hongyu Li, Vivek Nair, Vijayaraghavan Murali, Imad
Ahmad, J\"urgen Cito, Drew Carlson, Ari Aye, Wes Dyer
- Abstract summary: This article gives an account of the experiences we gained when researching and deploying an ML-based regression prediction pipeline at Meta.
Our investigation shows the inherent difficulty of the performance prediction problem, which is characterized by a large imbalance of benign onto regressing changes.
Our results also call into question the general applicability of Transformer-based architectures for performance prediction.
- Score: 11.45540873578889
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Catching and attributing code change-induced performance regressions in
production is hard; predicting them beforehand, even harder. A primer on
automatically learning to predict performance regressions in software, this
article gives an account of the experiences we gained when researching and
deploying an ML-based regression prediction pipeline at Meta. In this paper, we
report on a comparative study with four ML models of increasing complexity,
from (1) code-opaque, over (2) Bag of Words, (3) off-the-shelve
Transformer-based, to (4) a bespoke Transformer-based model, coined
SuperPerforator. Our investigation shows the inherent difficulty of the
performance prediction problem, which is characterized by a large imbalance of
benign onto regressing changes. Our results also call into question the general
applicability of Transformer-based architectures for performance prediction: an
off-the-shelve CodeBERT-based approach had surprisingly poor performance; our
highly customized SuperPerforator architecture initially achieved prediction
performance that was just on par with simpler Bag of Words models, and only
outperformed them for down-stream use cases. This ability of SuperPerforator to
transfer to an application with few learning examples afforded an opportunity
to deploy it in practice at Meta: it can act as a pre-filter to sort out
changes that are unlikely to introduce a regression, truncating the space of
changes to search a regression in by up to 43%, a 45x improvement over a random
baseline. To gain further insight into SuperPerforator, we explored it via a
series of experiments computing counterfactual explanations. These highlight
which parts of a code change the model deems important, thereby validating the
learned black-box model.
Related papers
- TrackFormers: In Search of Transformer-Based Particle Tracking for the High-Luminosity LHC Era [2.9052912091435923]
High-Energy Physics experiments are facing a multi-fold data increase with every new iteration.
One such step in need of an overhaul is the task of particle track reconstruction, a.k.a., tracking.
A Machine Learning-assisted solution is expected to provide significant improvements.
arXiv Detail & Related papers (2024-07-09T18:47:25Z) - Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis [63.66763657191476]
We show that efficient numerical training and inference algorithms as low-rank computation have impressive performance for learning Transformer-based adaption.
We analyze how magnitude-based models affect generalization while improving adaption.
We conclude that proper magnitude-based has a slight on the testing performance.
arXiv Detail & Related papers (2024-06-24T23:00:58Z) - Target Variable Engineering [0.0]
We compare the predictive performance of regression models trained to predict numeric targets vs. classifiers trained to predict their binarized counterparts.
We find that regression requires significantly more computational effort to converge upon the optimal performance.
arXiv Detail & Related papers (2023-10-13T23:12:21Z) - Uncovering mesa-optimization algorithms in Transformers [61.06055590704677]
Some autoregressive models can learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so.
We show that standard next-token prediction error minimization gives rise to a subsidiary learning algorithm that adjusts the model as new inputs are revealed.
Our findings explain in-context learning as a product of autoregressive loss minimization and inform the design of new optimization-based Transformer layers.
arXiv Detail & Related papers (2023-09-11T22:42:50Z) - Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - On the Generalization Ability of Retrieval-Enhanced Transformers [1.0552465253379135]
Off-loading memory from trainable weights to a retrieval database can significantly improve language modeling.
It has been suggested that at least some of this performance gain is due to non-trivial generalization based on both model weights and retrieval.
We find that the performance gains from retrieval largely originate from overlapping tokens between the database and the test data.
arXiv Detail & Related papers (2023-02-23T16:11:04Z) - What learning algorithm is in-context learning? Investigations with
linear models [87.91612418166464]
We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly.
We show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression.
Preliminary evidence that in-context learners share algorithmic features with these predictors.
arXiv Detail & Related papers (2022-11-28T18:59:51Z) - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing
Regressions In NLP Model Updates [68.09049111171862]
This work focuses on quantifying, reducing and analyzing regression errors in the NLP model updates.
We formulate the regression-free model updates into a constrained optimization problem.
We empirically analyze how model ensemble reduces regression.
arXiv Detail & Related papers (2021-05-07T03:33:00Z) - A Locally Adaptive Interpretable Regression [7.4267694612331905]
Linear regression is one of the most interpretable prediction models.
In this work, we introduce a locally adaptive interpretable regression (LoAIR)
Our model achieves comparable or better predictive performance than the other state-of-the-art baselines.
arXiv Detail & Related papers (2020-05-07T09:26:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.