Related papers: Formalizing Regression Testing for Agile and Continuous Integration Environments

Formalizing Regression Testing for Agile and Continuous Integration Environments

URL: http://arxiv.org/abs/2511.02810v1
Date: Tue, 04 Nov 2025 18:31:06 GMT
Title: Formalizing Regression Testing for Agile and Continuous Integration Environments
Authors: Suddhasvatta Das, Kevin Gary,
Abstract summary: We formalize the phenomenon of continuous or near-continuous regression testing using successive builds as a time-ordered chain.<n>We also formalize the regression test window between any two builds, which captures the limited time budget available for regression testing.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Software developed using modern agile practices delivers a stream of software versions that require continuous regression testing rather than testing once close to the delivery or maintenance phase, as assumed by classical regression-testing theory. In this work, we formalize the phenomenon of continuous or near-continuous regression testing using successive builds as a time-ordered chain, where each build contains the program, requirements, and the accompanying tests. We also formalize the regression test window between any two builds, which captures the limited time budget available for regression testing. As the time limit is set to infinity and the chain is closed to two builds, the model degenerates to retest-all, thereby preserving semantics for the classical two-version case. The formalization is validated by directly representing two state-of-the-art agile regression testing algorithms in terms of build-tuple operations without requiring auxiliary assumptions, followed by proof of the soundness and completeness of our formalization.

Related papers

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling [85.590774707406]
Unified models can handle both multimodal understanding and generation within a single architecture, yet they typically operate in a single pass without iteratively refining their outputs.<n>We introduce UniT, a framework for multimodal test-time scaling that enables a single unified model to reason, verify, and refine across multiple rounds.
arXiv Detail & Related papers (2026-02-12T18:59:49Z)
TTCS: Test-Time Curriculum Synthesis for Self-Evolving [47.826209735956716]
Test-Time Training offers a promising way to improve the reasoning ability of large language models.<n>We propose TTCS, a co-evolving test-time training framework.<n>We show that TTCS consistently strengthens the reasoning ability on challenging mathematical benchmarks.
arXiv Detail & Related papers (2026-01-30T06:38:02Z)
SAGE: Semantic-Aware Gray-Box Game Regression Testing with Large Language Models [12.705802209782506]
SAGE is a semanticaware regression testing framework for gray-box game environments.<n>It addresses the core challenges of test generation, maintenance, and selection.<n>It achieves superior bug detection with significantly lower execution cost, while demonstrating strong adaptability to version updates.
arXiv Detail & Related papers (2025-11-29T17:09:18Z)
Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking [54.43083499412643]
Test-time algorithms that combine the generative power of language models with process verifiers offer a promising lever for eliciting new reasoning capabilities.<n>We introduce a new process-guided test-time sampling algorithm, VGB, which uses theoretically grounded backtracking to achieve provably better robustness to verifier errors.
arXiv Detail & Related papers (2025-10-03T16:21:14Z)
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models [55.2480439325792]
We introduce QAlign, a new test-time alignment approach.<n>As we scale test-time compute, QAlign converges to sampling from the optimal aligned distribution for each individual prompt.<n>By adopting recent advances in Markov chain Monte Carlo for text generation, our method enables better-aligned outputs without modifying the underlying model or even requiring logit access.
arXiv Detail & Related papers (2025-04-04T00:41:40Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
Practical Pipeline-Aware Regression Test Optimization for Continuous Integration [9.079940595000087]
Continuous Integration (CI) is commonly applied to ensure consistent code quality.<n>Developers commonly split test executions across multiple pipelines, running small and fast tests in pre-submit stages while executing long-running and flaky tests in post-submit pipelines.<n>We developed a lightweight and pipeline-aware regression test optimization approach that employs Reinforcement Learning models trained on language-agnostic features.
arXiv Detail & Related papers (2025-01-20T15:39:16Z)
Deep anytime-valid hypothesis testing [29.273915933729057]
We propose a general framework for constructing powerful, sequential hypothesis tests for nonparametric testing problems. We develop a principled approach of leveraging the representation capability of machine learning models within the testing-by-betting framework. Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines.
arXiv Detail & Related papers (2023-10-30T09:46:19Z)
Sequential Kernelized Independence Testing [77.237958592189]
We design sequential kernelized independence tests inspired by kernelized dependence measures.<n>We demonstrate the power of our approaches on both simulated and real data.
arXiv Detail & Related papers (2022-12-14T18:08:42Z)
Semantic Self-adaptation: Enhancing Generalization with a Single Sample [45.111358665370524]
We propose a self-adaptive approach for semantic segmentation. It fine-tunes the parameters of convolutional layers to the input image using consistency regularization. Our empirical study suggests that self-adaptation may complement the established practice of model regularization at training time.
arXiv Detail & Related papers (2022-08-10T12:29:01Z)
Robust Continual Test-time Adaptation: Instance-aware BN and Prediction-balanced Memory [58.72445309519892]
We present a new test-time adaptation scheme that is robust against non-i.i.d. test data streams. Our novelty is mainly two-fold: (a) Instance-Aware Batch Normalization (IABN) that corrects normalization for out-of-distribution samples, and (b) Prediction-balanced Reservoir Sampling (PBRS) that simulates i.i.d. data stream from non-i.i.d. stream in a class-balanced manner.
arXiv Detail & Related papers (2022-08-10T03:05:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.