Related papers: Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples

Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples

URL: http://arxiv.org/abs/2405.12920v1
Date: Tue, 21 May 2024 16:42:02 GMT
Title: Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples
Authors: Tim Menzies, Andre Lustosa,
Abstract summary: This paper proposes a new challenge problem for software analytics. In the process we shall call "software review", a panel of SMEs (subject matter experts) review examples of software behavior to recommend how to improve that's software's operation. To support this review process, we explore methods that train a predictive model to guess if some oracle will like/dislike the next example. In 31 case studies, we show that such predictive models can be built using as few as 12 to 30 labels.
Score: 11.166755101891402
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper proposes a new challenge problem for software analytics. In the process we shall call "software review", a panel of SMEs (subject matter experts) review examples of software behavior to recommend how to improve that's software's operation. SME time is usually extremely limited so, ideally, this panel can complete this optimization task after looking at just a small number of very informative, examples. To support this review process, we explore methods that train a predictive model to guess if some oracle will like/dislike the next example. Such a predictive model can work with the SMEs to guide them in their exploration of all the examples. Also, after the panelists leave, that model can be used as an oracle in place of the panel (to handle new examples, while the panelists are busy, elsewhere). In 31 case studies (ranging from from high-level decisions about software processes to low-level decisions about how to configure video encoding software), we show that such predictive models can be built using as few as 12 to 30 labels. To the best of our knowledge, this paper's success with only a handful of examples (and no large language model) is unprecedented. In accordance with the principles of open science, we offer all our code and data at https://github.com/timm/ez/tree/Stable-EMSE-paper so that others can repeat/refute/improve these results.

Related papers

Fine-tuning for Better Few Shot Prompting: An Empirical Comparison for Short Answer Grading [0.5825410941577593]
Fine-tuning methods have historically required large-scale compute clusters inaccessible to most users.<n>New closed-model approaches such as OpenAI's fine-tuning service promise results with as few as 100 examples.<n>We evaluate both of these fine-tuning methods, measuring their interaction with few-shot prompting for automated short answer grading.
arXiv Detail & Related papers (2025-08-06T03:52:55Z)
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering [51.7496756448709]
Language models (LMs) perform well on coding benchmarks but struggle with real-world software engineering tasks.<n>Existing approaches rely on supervised fine-tuning with high-quality data, which is expensive to curate at scale.<n>We propose Test-Time Scaling (EvoScale), a sample-efficient method that treats generation as an evolutionary process.
arXiv Detail & Related papers (2025-05-29T16:15:36Z)
Enhancing Sample Selection by Cutting Mislabeled Easy Examples [62.13094877228772]
We show that mislabeled examples correctly predicted by the model early in the training process are particularly harmful to model performance. We propose Early Cutting, which employs the model's later training state to re-select the confident subset identified early in training.
arXiv Detail & Related papers (2025-02-12T09:12:45Z)
Can Large Language Models Improve SE Active Learning via Warm-Starts? [11.166755101891402]
"Active learners" use models learned from tiny samples of the data to find the next most informative example to label. This paper explores the use of Large Language Models (LLMs) for creating warm-starts. For 49 SE tasks, LLM-generated warm starts significantly improved the performance of low- and medium-dimensional tasks.
arXiv Detail & Related papers (2024-12-30T19:58:13Z)
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs [76.43407125275202]
o1-like models can emulate human-like long-time thinking during inference. This paper presents the first comprehensive study on the prevalent issue of overthinking in these models. We propose strategies to mitigate overthinking, streamlining reasoning processes without compromising accuracy.
arXiv Detail & Related papers (2024-12-30T18:55:12Z)
Demystifying Language Model Forgetting with Low-rank Example Associations [38.93348195407474]
Large Language models (LLMs) suffer from forgetting of upstream data when fine-tuned. We empirically analyze forgetting that occurs in $N$ upstream examples of language modeling or instruction-tuning after fine-tuning.
arXiv Detail & Related papers (2024-06-20T06:46:23Z)
Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners. We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting. Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z)
Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard [47.73060223236792]
BEIR is a benchmark dataset for evaluation of information retrieval models across 18 different domain/task combinations. Our work addresses two shortcomings that prevent the benchmark from achieving its full potential.
arXiv Detail & Related papers (2023-06-13T00:26:18Z)
RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning [53.52699766206808]
We propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning. We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches and learnable baselines.
arXiv Detail & Related papers (2023-05-23T20:15:56Z)
Toward a Theory of Causation for Interpreting Neural Code Models [49.906221295459275]
This paper introduces $do_code$, a post hoc interpretability method specific to Neural Code Models (NCMs) $do_code$ is based upon causal inference to enable language-oriented explanations. Results show that our studied NCMs are sensitive to changes in code syntax.
arXiv Detail & Related papers (2023-02-07T22:56:58Z)
Learning from Very Little Data: On the Value of Landscape Analysis for Predicting Software Project Health [13.19204187502255]
This paper only explores the application of niSNEAK to project health. That said, we see nothing in principle that prevents the application of this technique to a wider range of problems.
arXiv Detail & Related papers (2023-01-16T19:27:16Z)
Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs. We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process. Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv Detail & Related papers (2022-05-28T03:31:07Z)
When in Doubt, Summon the Titans: Efficient Inference with Large Models [80.2673230098021]
We propose a two-stage framework based on distillation that realizes the modelling benefits of large models. We use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples. Our proposed use of distillation to only handle easy instances allows for a more aggressive trade-off in the student size, thereby reducing the amortized cost of inference.
arXiv Detail & Related papers (2021-10-19T22:56:49Z)
Bayes DistNet -- A Robust Neural Network for Algorithm Runtime Distribution Predictions [1.8275108630751844]
Randomized algorithms are used in many state-of-the-art solvers for constraint satisfaction problems (CSP) and Boolean satisfiability (SAT) problems. Previous state-of-the-art methods directly try to predict a fixed parametric distribution that the input instance follows. This new model achieves robust predictive performance in the low observation setting, as well as handling censored observations.
arXiv Detail & Related papers (2020-12-14T01:15:39Z)
Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms. We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance. We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z)
Extending the statistical software package Engine for Likelihood-Free Inference [0.0]
This dissertation focuses on the implementation of the Robust optimisation Monte Carlo (ROMC) method in the software package Engine for Likelihood-Free Inference (ELFI) Our implementation provides a robust and efficient solution to a practitioner who wants to perform inference on a simulator-based model.
arXiv Detail & Related papers (2020-11-08T13:22:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.