Related papers: Crucible: Graphical Test Cases for Alloy Models

Crucible: Graphical Test Cases for Alloy Models

URL: http://arxiv.org/abs/2307.06922v1
Date: Thu, 13 Jul 2023 17:43:12 GMT
Title: Crucible: Graphical Test Cases for Alloy Models
Authors: Adam G. Emerson, Allison Sullivan
Abstract summary: This paper introduces Crucible, which allows users to graphically create AUnit test cases. Crucible provides automated guidance to users to ensure they are creating well structured, valuable test cases.
Score: 0.76146285961466
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Alloy is a declarative modeling language that is well suited for verifying system designs. Alloy models are automatically analyzed using the Analyzer, a toolset that helps the user understand their system by displaying the consequences of their properties, helping identify any missing or incorrect properties, and exploring the impact of modifications to those properties. To achieve this, the Analyzer invokes off-the-shelf SAT solvers to search for scenarios, which are assignments to the sets and relations of the model such that all executed formulas hold. To help write more accurate software models, Alloy has a unit testing framework, AUnit, which allows users to outline specific scenarios and check if those scenarios are correctly generated or prevented by their model. Unfortunately, AUnit currently only supports textual specifications of scenarios. This paper introduces Crucible, which allows users to graphically create AUnit test cases. In addition, Crucible provides automated guidance to users to ensure they are creating well structured, valuable test cases. As a result, Crucible eases the burden of adopting AUnit and brings AUnit test case creation more in line with how Alloy scenarios are commonly interacted with, which is graphically.

Related papers

ViUniT: Visual Unit Tests for More Robust Visual Programming [104.55763189099125]
When models answer correctly, they produce incorrect programs 33% of the time. We propose Visual Unit Testing (ViUniT), a framework to improve the reliability of visual programs by automatically generating unit tests.
arXiv Detail & Related papers (2024-12-12T01:36:18Z)
Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch. Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests. Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z)
Structure Editor for Building Software Models [0.5735035463793009]
A recent study of over 93,000 new user models reveals that users have trouble from the very start: nearly a third of the models novices write fail to compile. We believe that the issue is that Alloy's grammar and type information is passively relayed to the user despite this information outlining a narrow path for how to compose valid formulas. In this paper, we outline a proof-of-concept for a structure editor for Alloy in which user's build their models using block based inputs, rather than free typing, which by design prevents compilation errors.
arXiv Detail & Related papers (2024-06-13T18:21:02Z)
Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars [66.823588073584]
Large language models (LLMs) have shown impressive capabilities in real-world applications. The quality of these exemplars in the prompt greatly impacts performance. Existing methods fail to adequately account for the impact of exemplar ordering on the performance.
arXiv Detail & Related papers (2024-05-25T08:23:05Z)
Right or Wrong -- Understanding How Novice Users Write Software Models [0.6445605125467574]
This paper presents an empirical study of over 97,000 models written by novice users trying to learn Alloy. We investigate how users write both correct and incorrect models in order to produce a comprehensive benchmark for future use.
arXiv Detail & Related papers (2024-02-09T18:56:57Z)
Prompting Code Interpreter to Write Better Unit Tests on Quixbugs Functions [0.05657375260432172]
Unit testing is a commonly-used approach in software engineering to test the correctness and robustness of written code. In this study, we explore the effect of different prompts on the quality of unit tests generated by Code Interpreter. We find that the quality of the generated unit tests is not sensitive to changes in minor details in the prompts provided.
arXiv Detail & Related papers (2023-09-30T20:36:23Z)
Zero-Shot Object Counting with Language-Vision Models [50.1159882903028]
Class-agnostic object counting aims to count object instances of an arbitrary class at test time. Current methods require human-annotated exemplars as inputs which are often unavailable for novel categories. We propose zero-shot object counting (ZSC), a new setting where only the class name is available during test time.
arXiv Detail & Related papers (2023-09-22T14:48:42Z)
Revisiting and Improving Retrieval-Augmented Deep Assertion Generation [13.373681113601982]
Unit testing has become an essential activity in software development process. Yu et al. proposed an integrated approach (integration for short) to generate assertions for a unit test. Despite promising, there is still a knowledge gap as to why or where integration works or does not work.
arXiv Detail & Related papers (2023-09-19T02:39:02Z)
Zero-shot Model Diagnosis [80.36063332820568]
A common approach to evaluate deep learning models is to build a labeled test set with attributes of interest and assess how well it performs. This paper argues the case that Zero-shot Model Diagnosis (ZOOM) is possible without the need for a test set nor labeling.
arXiv Detail & Related papers (2023-03-27T17:59:33Z)
APANet: Adaptive Prototypes Alignment Network for Few-Shot Semantic Segmentation [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a given query image with only a few labeled support images. Most advanced solutions exploit a metric learning framework that performs segmentation through matching each query feature to a learned class-specific prototype. We present an adaptive prototype representation by introducing class-specific and class-agnostic prototypes.
arXiv Detail & Related papers (2021-11-24T04:38:37Z)
Auditing AI models for Verified Deployment under Semantic Specifications [65.12401653917838]
AuditAI bridges the gap between interpretable formal verification and scalability. We show how AuditAI allows us to obtain controlled variations for verification and certified training while addressing the limitations of verifying using only pixel-space perturbations.
arXiv Detail & Related papers (2021-09-25T22:53:24Z)
Generating Accurate Assert Statements for Unit Test Cases using Pretrained Transformers [10.846226514357866]
Unit testing represents the foundational basis of the software testing pyramid. We present an approach to support developers in writing unit test cases by generating accurate and useful assert statements.
arXiv Detail & Related papers (2020-09-11T19:35:09Z)
Beyond Accuracy: Behavioral Testing of NLP models with CheckList [66.42971817954806]
CheckList is a task-agnostic methodology for testing NLP models. CheckList includes a matrix of general linguistic capabilities and test types that facilitate comprehensive test ideation. In a user study, NLP practitioners with CheckList created twice as many tests, and found almost three times as many bugs as users without it.
arXiv Detail & Related papers (2020-05-08T15:48:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.