Insight Into SEER
- URL: http://arxiv.org/abs/2311.01164v2
- Date: Mon, 4 Dec 2023 23:22:43 GMT
- Title: Insight Into SEER
- Authors: Kasra Lekan, Nicki Choquette
- Abstract summary: The SEER tool was developed to predict test outcomes without needing assertion statements.
The tool has an overall accuracy of 93%, precision of 86%, recall of 94%, and an F1 score of 90%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developing test oracles can be inefficient: developer generative oracles are
time-intensive and thus costly while automatic oracle generation in the form of
regression or exception oracles assumes that the underlying code is correct. To
mitigate the high cost of testing oracles, the SEER tool was developed to
predict test outcomes without needing assertion statements. The creators of
SEER introduced the tool with an overall accuracy of 93%, precision of 86%,
recall of 94%, and an F1 score of 90%. If these results are replicable on new
data with perturbations, i.e. SEER is generalizable and robust, the model would
represent a significant advancement in the field of automated testing.
Consequently, we conducted a comprehensive reproduction of SEER and attempted
to verify the model's results on a new dataset.
Related papers
- Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering [51.7496756448709]
Language models (LMs) perform well on coding benchmarks but struggle with real-world software engineering tasks.<n>Existing approaches rely on supervised fine-tuning with high-quality data, which is expensive to curate at scale.<n>We propose Test-Time Scaling (EvoScale), a sample-efficient method that treats generation as an evolutionary process.
arXiv Detail & Related papers (2025-05-29T16:15:36Z) - What You See Is What You Get: Attention-based Self-guided Automatic Unit Test Generation [3.8244417073114003]
We propose Attention-based Self-guided Automatic Unit Test GenERation (AUGER) approach.
AUGER contains two stages: defect detection and error triggering.
It makes great improvements by 4.7% to 35.3% in terms of F1-score and Precision in defect detection.
It can trigger 23 to 84 more errors than state-of-the-art (SOTA) approaches in unit test generation.
arXiv Detail & Related papers (2024-12-01T14:28:48Z) - Enhancing Grammatical Error Detection using BERT with Cleaned Lang-8 Dataset [0.0]
This paper presents an improved LLM based model for Grammatical Error Detection (GED)
Traditional approach to GED involved hand-designed features, but recently, Neural Networks (NN) have automated the discovery of these features.
BERT-base-uncased model gave an impressive performance with an F1 score of 0.91 and accuracy of 98.49% on training data.
arXiv Detail & Related papers (2024-11-23T10:57:41Z) - Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation [73.9145653659403]
We show that Generative Error Correction models struggle to generalize beyond the specific types of errors encountered during training.
We propose DARAG, a novel approach designed to improve GEC for ASR in in-domain (ID) and OOD scenarios.
Our approach is simple, scalable, and both domain- and language-agnostic.
arXiv Detail & Related papers (2024-10-17T04:00:29Z) - TOGLL: Correct and Strong Test Oracle Generation with LLMs [0.8057006406834466]
Test oracles play a crucial role in software testing, enabling effective bug detection.
Despite initial promise, neural-based methods for automated test oracle generation often result in a large number of false positives.
We present the first comprehensive study to investigate the capabilities of LLMs in generating correct, diverse, and strong test oracles.
arXiv Detail & Related papers (2024-05-06T18:37:35Z) - Comparative Analysis of Epileptic Seizure Prediction: Exploring Diverse
Pre-Processing Techniques and Machine Learning Models [0.0]
We present a comparative analysis of five machine learning models for the prediction of epileptic seizures using EEG data.
The results of our analysis demonstrate the performance of each model in terms of accuracy.
The ET model exhibited the best performance with an accuracy of 99.29%.
arXiv Detail & Related papers (2023-08-06T08:50:08Z) - Neural-Based Test Oracle Generation: A Large-scale Evaluation and
Lessons Learned [17.43060451305942]
TOGA is a recently developed neural-based method for automatic test oracle generation.
It misclassifies the type of oracle needed 24% of the time and that when it classifies correctly around 62% of the time it is not confident enough to generate any assertion oracle.
These findings expose limitations of the state-of-the-art neural-based oracle generation technique.
arXiv Detail & Related papers (2023-07-29T16:34:56Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - EEG-Fest: Few-shot based Attention Network for Driver's Vigilance
Estimation with EEG Signals [160.57870373052577]
A lack of driver's vigilance is the main cause of most vehicle crashes.
EEG has been reliable and efficient tool for drivers' drowsiness estimation.
arXiv Detail & Related papers (2022-11-07T21:35:08Z) - Efficient Test-Time Model Adaptation without Forgetting [60.36499845014649]
Test-time adaptation seeks to tackle potential distribution shifts between training and testing data.
We propose an active sample selection criterion to identify reliable and non-redundant samples.
We also introduce a Fisher regularizer to constrain important model parameters from drastic changes.
arXiv Detail & Related papers (2022-04-06T06:39:40Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - TACRED Revisited: A Thorough Evaluation of the TACRED Relation
Extraction Task [80.38130122127882]
TACRED is one of the largest, most widely used crowdsourced datasets in Relation Extraction (RE)
In this paper, we investigate the questions: Have we reached a performance ceiling or is there still room for improvement?
We find that label errors account for 8% absolute F1 test error, and that more than 50% of the examples need to be relabeled.
arXiv Detail & Related papers (2020-04-30T15:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.