Extended Paper: API-driven Program Synthesis for Testing Static Typing
Implementations
- URL: http://arxiv.org/abs/2311.04527v1
- Date: Wed, 8 Nov 2023 08:32:40 GMT
- Title: Extended Paper: API-driven Program Synthesis for Testing Static Typing
Implementations
- Authors: Thodoris Sotiropoulos, Stefanos Chaliasos, Zhendong Su
- Abstract summary: We introduce a novel approach for testing static typing implementations based on the concept of API-driven program synthesis.
The idea is to synthesize type-intensive but small and well-typed programs by leveraging and combining application programming interfaces (APIs) derived from existing software libraries.
- Score: 11.300829269111627
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a novel approach for testing static typing implementations based
on the concept of API-driven program synthesis. The idea is to synthesize
type-intensive but small and well-typed programs by leveraging and combining
application programming interfaces (APIs) derived from existing software
libraries. Our primary insight is backed up by real-world evidence: a
significant number of compiler typing bugs are caused by small test cases that
employ APIs from the standard library of the language under test. This is
attributed to the inherent complexity of the majority of these APIs, which
often exercise a wide range of sophisticated type-related features. The main
contribution of our approach is the ability to produce small client programs
with increased feature coverage, without bearing the burden of generating the
corresponding well-formed API definitions from scratch. To validate diverse
aspects of static typing procedures (i.e., soundness, precision of type
inference), we also enrich our API-driven approach with fault-injection and
semantics-preserving modes, along with their corresponding test oracles.
We evaluate our implemented tool, Thalia on testing the static typing
implementations of the compilers for three popular languages, namely, Scala,
Kotlin, and Groovy. Thalia has uncovered 84 typing bugs (77 confirmed and 22
fixed), most of which are triggered by test cases featuring APIs that rely on
parametric polymorphism, overloading, and higher-order functions. Our
comparison with state-of-the-art shows that Thalia yields test programs with
distinct characteristics, offering additional and complementary benefits.
Related papers
- KAT: Dependency-aware Automated API Testing with Large Language Models [1.7264233311359707]
KAT (Katalon API Testing) is a novel AI-driven approach that autonomously generates test cases to validate APIs.
Our evaluation of KAT using 12 real-world services shows that it can improve validation coverage, detect more undocumented status codes, and reduce false positives in these services.
arXiv Detail & Related papers (2024-07-14T14:48:18Z) - DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages [49.38663048447942]
We propose DIALECTBENCH, the first-ever large-scale benchmark for NLP on varieties.
This allows for a comprehensive evaluation of NLP system performance on different language varieties.
We provide substantial evidence of performance disparities between standard and non-standard language varieties.
arXiv Detail & Related papers (2024-03-16T20:18:36Z) - Evolutionary Generative Fuzzing for Differential Testing of the Kotlin
Compiler [14.259471945857431]
We investigate the effectiveness of differential testing in finding bugs within the Kotlin compilers developed at JetBrains.
We propose a black-box generative approach that creates input programs for the K1 and K2 compilers.
Our case study shows that the proposed approach effectively detects bugs in K1 and K2; these bugs have been confirmed and (some) fixed by JetBrains developers.
arXiv Detail & Related papers (2024-01-12T16:01:12Z) - Leveraging Large Language Models to Improve REST API Testing [51.284096009803406]
RESTGPT takes as input an API specification, extracts machine-interpretable rules, and generates example parameter values from natural-language descriptions in the specification.
Our evaluations indicate that RESTGPT outperforms existing techniques in both rule extraction and value generation.
arXiv Detail & Related papers (2023-12-01T19:53:23Z) - Learning Type Inference for Enhanced Dataflow Analysis [6.999203506253375]
We propose CodeTIDAL5, a Transformer-based model trained to reliably predict type annotations.
Our model outperforms the current state-of-the-art by 7.85% on the ManyTypes4TypeScript benchmark.
We present JoernTI, an integration of our approach into Joern, an open source static analysis tool.
arXiv Detail & Related papers (2023-10-01T13:52:28Z) - TypeT5: Seq2seq Type Inference using Static Analysis [51.153089609654174]
We present a new type inference method that treats type prediction as a code infilling task.
Our method uses static analysis to construct dynamic contexts for each code element whose type signature is to be predicted by the model.
We also propose an iterative decoding scheme that incorporates previous type predictions in the model's input context.
arXiv Detail & Related papers (2023-03-16T23:48:00Z) - Intergenerational Test Generation for Natural Language Processing
Applications [16.63835131985415]
We propose an automated test generation method for detecting erroneous behaviors of various NLP applications.
We implement this method into NLPLego, which is designed to fully exploit the potential of seed sentences.
NLPLego successfully detects 1,732, 5301, and 261,879 incorrect behaviors with around 95.7% precision in three tasks.
arXiv Detail & Related papers (2023-02-21T07:57:59Z) - Binding Language Models in Symbolic Languages [146.3027328556881]
Binder is a training-free neural-symbolic framework that maps the task input to a program.
In the parsing stage, Codex is able to identify the part of the task input that cannot be answerable by the original programming language.
In the execution stage, Codex can perform versatile functionalities given proper prompts in the API calls.
arXiv Detail & Related papers (2022-10-06T12:55:17Z) - BenchCLAMP: A Benchmark for Evaluating Language Models on Syntactic and
Semantic Parsing [55.058258437125524]
We introduce BenchCLAMP, a Benchmark to evaluate Constrained LAnguage Model Parsing.
We benchmark eight language models, including two GPT-3 variants available only through an API.
Our experiments show that encoder-decoder pretrained language models can achieve similar performance or surpass state-of-the-art methods for syntactic and semantic parsing when the model output is constrained to be valid.
arXiv Detail & Related papers (2022-06-21T18:34:11Z) - Exploring Software Naturalness through Neural Language Models [56.1315223210742]
The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing.
We explore this hypothesis through the use of a pre-trained transformer-based language model to perform code analysis tasks.
arXiv Detail & Related papers (2020-06-22T21:56:14Z) - Template Guided Text Generation for Task-Oriented Dialogue [9.690158790639131]
Virtual assistants such as Google Assistant, Amazon Alexa, and Apple Siri enable users to interact with a large number of services and APIs on the web using natural language.
In this work, we investigate two methods for Natural Language Generation using a single domain-independent model across a large number of APIs.
arXiv Detail & Related papers (2020-04-30T17:51:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.