Related papers: Mimicking Production Behavior with Generated Mocks

Mimicking Production Behavior with Generated Mocks

URL: http://arxiv.org/abs/2208.01321v4
Date: Tue, 10 Sep 2024 11:00:02 GMT
Title: Mimicking Production Behavior with Generated Mocks
Authors: Deepika Tiwari, Martin Monperrus, Benoit Baudry,
Abstract summary: We propose to monitor an application in production to generate tests that mimic realistic execution scenarios through mocks. The approach is automated and implemented in an open-source tool called RICK. All the generated test cases are executable, and 52.4% of them successfully mimic the complete execution context of the target methods observed in production.
Score: 11.367562045401554
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mocking allows testing program units in isolation. A developer who writes tests with mocks faces two challenges: design realistic interactions between a unit and its environment; and understand the expected impact of these interactions on the behavior of the unit. In this paper, we propose to monitor an application in production to generate tests that mimic realistic execution scenarios through mocks. Our approach operates in three phases. First, we instrument a set of target methods for which we want to generate tests, as well as the methods that they invoke, which we refer to as mockable method calls. Second, in production, we collect data about the context in which target methods are invoked, as well as the parameters and the returned value for each mockable method call. Third, offline, we analyze the production data to generate test cases with realistic inputs and mock interactions. The approach is automated and implemented in an open-source tool called RICK. We evaluate our approach with three real-world, open-source Java applications. RICK monitors the invocation of 128 methods in production across the three applications and captures their behavior. Based on this captured data, RICK generates test cases that include realistic initial states and test inputs, as well as mocks and stubs. All the generated test cases are executable, and 52.4% of them successfully mimic the complete execution context of the target methods observed in production. The mock-based oracles are also effective at detecting regressions within the target methods, complementing each other in their fault-finding ability. We interview 5 developers from the industry who confirm the relevance of using production observations to design mocks and stubs. Our experimental findings clearly demonstrate the feasibility and added value of generating mocks from production interactions.

Related papers

TestWeaver: Execution-aware, Feedback-driven Regression Testing Generation with Large Language Models [5.871736617580623]
Regression testing ensures that code changes do not unintentionally break existing functionality.<n>Recent advances in large language models (LLMs) have shown promise in automating test generation for regression testing.<n>We present TestWeaver, a novel approach that integrates lightweight program analysis to guide test generation more effectively.
arXiv Detail & Related papers (2025-08-02T08:13:02Z)
Understanding and Characterizing Mock Assertions in Unit Tests [12.96550571237691]
Despite their significance, mock assertions are rarely considered by automated test generation techniques. Our analysis of 4,652 test cases from 11 popular Java projects reveals that mock assertions are mostly applied to validating specific kinds of method calls. We find that mock assertions complement traditional test assertions by ensuring the desired side effects have been produced.
arXiv Detail & Related papers (2025-03-25T02:35:05Z)
Mock Deep Testing: Toward Separate Development of Data and Models for Deep Learning [21.563130049562357]
This research introduces our methodology of mock deep testing for unit testing of deep learning applications. To enable unit testing, we introduce a design paradigm that decomposes the workflow into distinct, manageable components. We have developed KUnit, a framework for enabling mock deep testing for the Keras library.
arXiv Detail & Related papers (2025-02-11T17:11:11Z)
ViUniT: Visual Unit Tests for More Robust Visual Programming [104.55763189099125]
When models answer correctly, they produce incorrect programs 33% of the time. We propose Visual Unit Testing (ViUniT), a framework to improve the reliability of visual programs by automatically generating unit tests.
arXiv Detail & Related papers (2024-12-12T01:36:18Z)
Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch. Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests. Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z)
ASSERTIFY: Utilizing Large Language Models to Generate Assertions for Production Code [0.7973214627863593]
Production assertions are statements embedded in the code to help developers validate their assumptions about the code. Current assertion generation techniques, such as static analysis and deep learning, fall short when it comes to generating production assertions. This preprint addresses the gap by introducing Assertify, an automated end-to-end tool that leverages Large Language Models (LLMs) and prompt engineering to generate production assertions.
arXiv Detail & Related papers (2024-11-25T20:52:28Z)
LLM-based Unit Test Generation via Property Retrieval [26.906316611858518]
Property-Based Retrieval Augmentation extends LLM-based Retrieval-Augmented Generation beyond basic vector, text similarity, and graph-based methods. Our approach considers task-specific context and introduces a tailored property retrieval mechanism. We implement this approach in a tool called APT, which sequentially performs preprocessing, property retrieval, and unit test generation.
arXiv Detail & Related papers (2024-10-17T13:33:12Z)
Generative Judge for Evaluating Alignment [84.09815387884753]
We propose a generative judge with 13B parameters, Auto-J, designed to address these challenges. Our model is trained on user queries and LLM-generated responses under massive real-world scenarios. Experimentally, Auto-J outperforms a series of strong competitors, including both open-source and closed-source models.
arXiv Detail & Related papers (2023-10-09T07:27:15Z)
A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems. static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models. We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z)
SAGA: Summarization-Guided Assert Statement Generation [34.51502565985728]
This paper presents a novel summarization-guided approach for automatically generating assert statements. We leverage a pre-trained language model as the reference architecture and fine-tune it on the task of assert statement generation.
arXiv Detail & Related papers (2023-05-24T07:03:21Z)
Re-Evaluating LiDAR Scene Flow for Autonomous Driving [80.37947791534985]
Popular benchmarks for self-supervised LiDAR scene flow have unrealistic rates of dynamic motion, unrealistic correspondences, and unrealistic sampling patterns. We evaluate a suite of top methods on a suite of real-world datasets. We show that despite the emphasis placed on learning, most performance gains are caused by pre- and post-processing steps.
arXiv Detail & Related papers (2023-04-04T22:45:50Z)
Operationalizing Machine Learning: An Interview Study [13.300075655862573]
We conduct semi-structured interviews with 18 machine learning engineers (MLEs) working across many applications. Our interviews expose three variables that govern success for a production ML deployment: Velocity, Validation, and Versioning. We summarize common practices for successful ML experimentation, deployment, and sustaining production performance.
arXiv Detail & Related papers (2022-09-16T16:59:36Z)
Benchopt: Reproducible, efficient and collaborative optimization benchmarks [67.29240500171532]
Benchopt is a framework to automate, reproduce and publish optimization benchmarks in machine learning. Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments.
arXiv Detail & Related papers (2022-06-27T16:19:24Z)
Visual Transformer for Task-aware Active Learning [49.903358393660724]
We present a novel pipeline for pool-based Active Learning. Our method exploits accessible unlabelled examples during training to estimate their co-relation with the labelled examples. Visual Transformer models non-local visual concept dependency between labelled and unlabelled examples.
arXiv Detail & Related papers (2021-06-07T17:13:59Z)
S3M: Siamese Stack (Trace) Similarity Measure [55.58269472099399]
We present S3M -- the first approach to computing stack trace similarity based on deep learning. It is based on a biLSTM encoder and a fully-connected classifier to compute similarity. Our experiments demonstrate the superiority of our approach over the state-of-the-art on both open-sourced data and a private JetBrains dataset.
arXiv Detail & Related papers (2021-03-18T21:10:41Z)
Generating Accurate Assert Statements for Unit Test Cases using Pretrained Transformers [10.846226514357866]
Unit testing represents the foundational basis of the software testing pyramid. We present an approach to support developers in writing unit test cases by generating accurate and useful assert statements.
arXiv Detail & Related papers (2020-09-11T19:35:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.