Related papers: Relevant information in TDD experiment reporting

Relevant information in TDD experiment reporting

URL: http://arxiv.org/abs/2406.06405v1
Date: Mon, 10 Jun 2024 15:57:56 GMT
Title: Relevant information in TDD experiment reporting
Authors: Fernando Uyaguari, Silvia T. Acuña, John W. Castro, Davide Fucci, Oscar Dieste, Sira Vegas,
Abstract summary: This article aims to identify the response variable operationalization components in TDD experiments that study external quality. The test suites, intervention types and measurers have an influence on the measurements and results of the systematic mapping study (SMS) The results of our SMS confirm that TDD experiments do not usually report either the test suites, the test case generation method, or the details of how external quality was measured.
Score: 40.670930098576775
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Experiments are a commonly used method of research in software engineering (SE). Researchers report their experiments following detailed guidelines. However, researchers do not, in the field of test-driven development (TDD) at least, specify how they operationalized the response variables and the measurement process. This article has three aims: (i) identify the response variable operationalization components in TDD experiments that study external quality; (ii) study their influence on the experimental results;(ii) determine if the experiment reports describe the measurement process components that have an impact on the results. Sequential mixed method. The first part of the research adopts a quantitative approach applying a statistical an\'alisis (SA) of the impact of the operationalization components on the experimental results. The second part follows on with a qualitative approach applying a systematic mapping study (SMS). The test suites, intervention types and measurers have an influence on the measurements and results of the SA of TDD experiments in SE. The test suites have a major impact on both the measurements and the results of the experiments. The intervention type has less impact on the results than on the measurements. While the measurers have an impact on the measurements, this is not transferred to the experimental results. On the other hand, the results of our SMS confirm that TDD experiments do not usually report either the test suites, the test case generation method, or the details of how external quality was measured. A measurement protocol should be used to assure that the measurements made by different measurers are similar. It is necessary to report the test cases, the experimental task and the intervention type in order to be able to reproduce the measurements and SA, as well as to replicate experiments and build dependable families of experiments.

Related papers

Data Fusion for Partial Identification of Causal Effects [62.56890808004615]
We propose a novel partial identification framework that enables researchers to answer key questions.<n>Is the causal effect positive or negative? and How severe must assumption violations be to overturn this conclusion?<n>We apply our framework to the Project STAR study, which investigates the effect of classroom size on students' third-grade standardized test performance.
arXiv Detail & Related papers (2025-05-30T07:13:01Z)
MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback [128.2992631982687]
We introduce the task of experiment-guided ranking, which aims to prioritize candidate hypotheses based on the results of previously tested ones.<n>We propose a simulator grounded in three domain-informed assumptions, modeling hypothesis performance as a function of similarity to a known ground truth hypothesis.<n>We curate a dataset of 124 chemistry hypotheses with experimentally reported outcomes to validate the simulator.
arXiv Detail & Related papers (2025-05-23T13:24:50Z)
Post Launch Evaluation of Policies in a High-Dimensional Setting [4.710921988115686]
A/B tests, also known as randomized controlled experiments (RCTs), are the gold standard for evaluating the impact of new policies, products, or decisions. This paper explores practical considerations in applying methodologies inspired by "synthetic control" Synthetic control methods leverage data from unaffected units to estimate counterfactual outcomes for treated units.
arXiv Detail & Related papers (2024-12-30T19:35:29Z)
Variance reduction combining pre-experiment and in-experiment data [0.0]
Online controlled experiments (A/B testing) are essential in data-driven decision-making for many companies. Existing methods like CUPED and CUPAC use pre-experiment data to reduce variance, but their effectiveness depends on the correlation between the pre-experiment data and the outcome. We introduce a novel method that combines both pre-experiment and in-experiment data to achieve greater variance reduction than CUPED and CUPAC.
arXiv Detail & Related papers (2024-10-11T17:45:29Z)
Identification of Single-Treatment Effects in Factorial Experiments [0.0]
I show that when multiple interventions are randomized in experiments, the effect any single intervention would have outside the experimental setting is not identified absent heroic assumptions. observational studies and factorial experiments provide information about potential-outcome distributions with zero and multiple interventions. I show that researchers who rely on this type of design have to justify either linearity of functional forms or specify with Directed Acyclic Graphs how variables are related in the real world.
arXiv Detail & Related papers (2024-05-16T04:01:53Z)
Assessing effect sizes, variability, and power in the on-line study of language production [0.0]
We compare response time data obtained in the same word production experiment conducted in the lab and on-line. We determine whether the two settings differ in effect sizes, in the consistency of responses over the course of the experiment. We assess the impact of these differences on the power of the design in a series of simulations.
arXiv Detail & Related papers (2024-03-19T11:49:03Z)
Adaptive Instrument Design for Indirect Experiments [48.815194906471405]
Unlike RCTs, indirect experiments estimate treatment effects by leveragingconditional instrumental variables. In this paper we take the initial steps towards enhancing sample efficiency for indirect experiments by adaptively designing a data collection policy. Our main contribution is a practical computational procedure that utilizes influence functions to search for an optimal data collection policy.
arXiv Detail & Related papers (2023-12-05T02:38:04Z)
Choosing a Proxy Metric from Past Experiments [54.338884612982405]
In many randomized experiments, the treatment effect of the long-term metric is often difficult or infeasible to measure. A common alternative is to measure several short-term proxy metrics in the hope they closely track the long-term metric. We introduce a new statistical framework to both define and construct an optimal proxy metric for use in a homogeneous population of randomized experiments.
arXiv Detail & Related papers (2023-09-14T17:43:02Z)
Learning sources of variability from high-dimensional observational studies [41.06757602546625]
Causal inference studies whether the presence of a variable influences an observed outcome. Our work generalizes causal estimands to outcomes with any number of dimensions or any measurable space. We propose a simple technique for adjusting universally consistent conditional independence tests.
arXiv Detail & Related papers (2023-07-26T00:01:16Z)
SPOT: Sequential Predictive Modeling of Clinical Trial Outcome with Meta-Learning [67.8195828626489]
Clinical trials are essential to drug development but time-consuming, costly, and prone to failure. We propose Sequential Predictive mOdeling of clinical Trial outcome (SPOT) that first identifies trial topics to cluster the multi-sourced trial data into relevant trial topics. With the consideration of each trial sequence as a task, it uses a meta-learning strategy to achieve a point where the model can rapidly adapt to new tasks with minimal updates.
arXiv Detail & Related papers (2023-04-07T23:04:27Z)
Experimentally determining the incompatibility of two qubit measurements [55.41644538483948]
We describe and realize an experimental procedure for assessing the incompatibility of two qubit measurements. We demonstrate this fact in an optical setup, where the qubit states are encoded into the photons' polarization degrees of freedom.
arXiv Detail & Related papers (2021-12-15T19:01:44Z)
Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework [68.96770035057716]
A/B testing is a business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries. This paper introduces a reinforcement learning framework for carrying A/B testing in online experiments.
arXiv Detail & Related papers (2020-02-05T10:25:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.