Insight Into SEER
- URL: http://arxiv.org/abs/2311.01164v2
- Date: Mon, 4 Dec 2023 23:22:43 GMT
- Title: Insight Into SEER
- Authors: Kasra Lekan, Nicki Choquette
- Abstract summary: The SEER tool was developed to predict test outcomes without needing assertion statements.
The tool has an overall accuracy of 93%, precision of 86%, recall of 94%, and an F1 score of 90%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developing test oracles can be inefficient: developer generative oracles are
time-intensive and thus costly while automatic oracle generation in the form of
regression or exception oracles assumes that the underlying code is correct. To
mitigate the high cost of testing oracles, the SEER tool was developed to
predict test outcomes without needing assertion statements. The creators of
SEER introduced the tool with an overall accuracy of 93%, precision of 86%,
recall of 94%, and an F1 score of 90%. If these results are replicable on new
data with perturbations, i.e. SEER is generalizable and robust, the model would
represent a significant advancement in the field of automated testing.
Consequently, we conducted a comprehensive reproduction of SEER and attempted
to verify the model's results on a new dataset.
Related papers
- TOGLL: Correct and Strong Test Oracle Generation with LLMs [0.8057006406834466]
Test oracles play a crucial role in software testing, enabling effective bug detection.
Despite initial promise, neural-based methods for automated test oracle generation often result in a large number of false positives.
We present the first comprehensive study to investigate the capabilities of LLMs in generating correct, diverse, and strong test oracles.
arXiv Detail & Related papers (2024-05-06T18:37:35Z) - Retrosynthesis prediction enhanced by in-silico reaction data
augmentation [66.5643280109899]
We present RetroWISE, a framework that employs a base model inferred from real paired data to perform in-silico reaction generation and augmentation.
On three benchmark datasets, RetroWISE achieves the best overall performance against state-of-the-art models.
arXiv Detail & Related papers (2024-01-31T07:40:37Z) - Comparative Analysis of Epileptic Seizure Prediction: Exploring Diverse
Pre-Processing Techniques and Machine Learning Models [0.0]
We present a comparative analysis of five machine learning models for the prediction of epileptic seizures using EEG data.
The results of our analysis demonstrate the performance of each model in terms of accuracy.
The ET model exhibited the best performance with an accuracy of 99.29%.
arXiv Detail & Related papers (2023-08-06T08:50:08Z) - Neural-Based Test Oracle Generation: A Large-scale Evaluation and
Lessons Learned [17.43060451305942]
TOGA is a recently developed neural-based method for automatic test oracle generation.
It misclassifies the type of oracle needed 24% of the time and that when it classifies correctly around 62% of the time it is not confident enough to generate any assertion oracle.
These findings expose limitations of the state-of-the-art neural-based oracle generation technique.
arXiv Detail & Related papers (2023-07-29T16:34:56Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - EEG-Fest: Few-shot based Attention Network for Driver's Vigilance
Estimation with EEG Signals [160.57870373052577]
A lack of driver's vigilance is the main cause of most vehicle crashes.
EEG has been reliable and efficient tool for drivers' drowsiness estimation.
arXiv Detail & Related papers (2022-11-07T21:35:08Z) - Efficient Test-Time Model Adaptation without Forgetting [60.36499845014649]
Test-time adaptation seeks to tackle potential distribution shifts between training and testing data.
We propose an active sample selection criterion to identify reliable and non-redundant samples.
We also introduce a Fisher regularizer to constrain important model parameters from drastic changes.
arXiv Detail & Related papers (2022-04-06T06:39:40Z) - Re-TACRED: Addressing Shortcomings of the TACRED Dataset [5.820381428297218]
TACRED is one of the largest and most widely used sentence-level relation extraction datasets.
Proposed models that are evaluated using this dataset consistently set new state-of-the-art performance.
However, they still exhibit large error rates despite leveraging external knowledge and unsupervised pretraining on large text corpora.
arXiv Detail & Related papers (2021-04-16T22:55:11Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - TACRED Revisited: A Thorough Evaluation of the TACRED Relation
Extraction Task [80.38130122127882]
TACRED is one of the largest, most widely used crowdsourced datasets in Relation Extraction (RE)
In this paper, we investigate the questions: Have we reached a performance ceiling or is there still room for improvement?
We find that label errors account for 8% absolute F1 test error, and that more than 50% of the examples need to be relabeled.
arXiv Detail & Related papers (2020-04-30T15:07:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.