Related papers: Highly Interactive Testing for Uninterrupted Development Flow

Highly Interactive Testing for Uninterrupted Development Flow

URL: http://arxiv.org/abs/2508.02176v1
Date: Mon, 04 Aug 2025 08:17:40 GMT
Title: Highly Interactive Testing for Uninterrupted Development Flow
Authors: Andrew Tropin,
Abstract summary: We present a library that provides runtime representation for tests, allowing tight integration with HIDE tooling.<n>We describe development enhanced with testing and demonstrate how they achieve subsecond test reexecution times crucial for maintaining developer focus.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Highly interactive development environments (HIDEs) enable uninterrupted development flow through continuous program evolution and rapid hypothesis checking. However, traditional testing approaches -- typically executed separately via CLI -- isolate tests from HIDE tooling (interactive debuggers, value and stack inspectors, etc.) and introduce disruptive delays due to coarse execution granularity and lack of runtime context. This disconnect breaks development flow by exceeding critical attention thresholds. In this paper we present a library that provides runtime representation for tests, allowing tight integration with HIDEs, and enabling immediate access to HIDE tooling in the context of test failure. We then describe development workflows enhanced with testing and demonstrate how they achieve subsecond test reexecution times crucial for maintaining developer focus.

Related papers

From Reproduction to Replication: Evaluating Research Agents with Progressive Code Masking [48.90371827091671]
AutoExperiment is a benchmark that evaluates AI agents' ability to implement and run machine learning experiments.<n>We evaluate state-of-the-art agents and find that performance degrades rapidly as $n$ increases.<n>Our findings highlight critical challenges in long-horizon code generation, context retrieval, and autonomous experiment execution.
arXiv Detail & Related papers (2025-06-24T15:39:20Z)
Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z)
Revisit Self-Debugging with Self-Generated Tests for Code Generation [18.643472696246686]
Self-ging with self-generated tests is a promising solution but lacks a full exploration of its limitations and practical potential.<n>We propose two paradigms for the process: post-execution and in-execution self-ging.<n>We find that post-execution self-ging struggles on basic problems but shows potential for improvement on competitive ones, due to the bias introduced by self-generated tests.
arXiv Detail & Related papers (2025-01-22T10:54:19Z)
Practical Pipeline-Aware Regression Test Optimization for Continuous Integration [9.079940595000087]
Continuous Integration (CI) is commonly applied to ensure consistent code quality.<n>Developers commonly split test executions across multiple pipelines, running small and fast tests in pre-submit stages while executing long-running and flaky tests in post-submit pipelines.<n>We developed a lightweight and pipeline-aware regression test optimization approach that employs Reinforcement Learning models trained on language-agnostic features.
arXiv Detail & Related papers (2025-01-20T15:39:16Z)
Commit0: Library Generation from Scratch [77.38414688148006]
Commit0 is a benchmark that challenges AI agents to write libraries from scratch.<n>Agents are provided with a specification document outlining the library's API as well as a suite of interactive unit tests.<n> Commit0 also offers an interactive environment where models receive static analysis and execution feedback on the code they generate.
arXiv Detail & Related papers (2024-12-02T18:11:30Z)
LLM-based Unit Test Generation via Property Retrieval [26.906316611858518]
Property-Based Retrieval Augmentation extends LLM-based Retrieval-Augmented Generation beyond basic vector, text similarity, and graph-based methods. Our approach considers task-specific context and introduces a tailored property retrieval mechanism. We implement this approach in a tool called APT, which sequentially performs preprocessing, property retrieval, and unit test generation.
arXiv Detail & Related papers (2024-10-17T13:33:12Z)
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z)
ASTER: Natural and Multi-language Unit Test Generation with LLMs [6.259245181881262]
We describe a generic pipeline that incorporates static analysis to guide LLMs in generating compilable and high-coverage test cases.<n>We conduct an empirical study to assess the quality of the generated tests in terms of code coverage and test naturalness.
arXiv Detail & Related papers (2024-09-04T21:46:18Z)
Benchopt: Reproducible, efficient and collaborative optimization benchmarks [67.29240500171532]
Benchopt is a framework to automate, reproduce and publish optimization benchmarks in machine learning. Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments.
arXiv Detail & Related papers (2022-06-27T16:19:24Z)
LSTC: Boosting Atomic Action Detection with Long-Short-Term Context [60.60267767456306]
We decompose the action recognition pipeline into short-term and long-term reliance. Within our design, a local aggregation branch is utilized to gather dense and informative short-term cues. Both branches independently predict the context-specific actions and the results are merged in the end.
arXiv Detail & Related papers (2021-10-19T10:09:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.