What Happened in This Pipeline? Diffing Build Logs with CiDiff
- URL: http://arxiv.org/abs/2504.18182v1
- Date: Fri, 25 Apr 2025 08:56:21 GMT
- Title: What Happened in This Pipeline? Diffing Build Logs with CiDiff
- Authors: Nicolas Hubner, Jean-Rémy Falleri, Raluca Uricaru, Thomas Degueule, Thomas Durieux,
- Abstract summary: We introduce a new diff algorithm specifically tailored to build logs called CiDiff.<n>We evaluate CiDiff against several baselines on a novel dataset of 17 906 CI regressions.
- Score: 3.093293209977702
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continuous integration (CI) is widely used by developers to ensure the quality and reliability of their software projects. However, diagnosing a CI regression is a tedious process that involves the manual analysis of lengthy build logs. In this paper, we explore how textual differencing can support the debugging of CI regressions. As off-the-shelf diff algorithms produce suboptimal results, in this work we introduce a new diff algorithm specifically tailored to build logs called CiDiff. We evaluate CiDiff against several baselines on a novel dataset of 17 906 CI regressions, performing an accuracy study, a quantitative study and a user-study. Notably, our algorithm reduces the number of lines to inspect by about 60 % in the median case, with reasonable overhead compared to the state-of-practice LCS-diff. Finally, our algorithm is preferred by the majority of participants in 70 % of the regression cases, whereas LCS-diff is preferred in only 5 % of the cases.
Related papers
- Combatting Dimensional Collapse in LLM Pre-Training Data via Diversified File Selection [65.96556073745197]
DiverSified File selection algorithm (DiSF) is proposed to select the most decorrelated text files in the feature space.
DiSF saves 98.5% of 590M training files in SlimPajama, outperforming the full-data pre-training within a 50B training budget.
arXiv Detail & Related papers (2025-04-29T11:13:18Z) - Toward Interactive Optimization of Source Code Differences: An Empirical Study of Its Performance [1.313675711285772]
We propose an interactive approach to optimize source code differences (diffs)
Users can provide feedback for the points of a diff that should not be matched but are or parts that should be matched but are not.
The results of 23 GitHub projects confirm that 92% of nonoptimal diffs can be addressed with less than four feedback actions in the ideal case.
arXiv Detail & Related papers (2024-09-20T15:43:55Z) - A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent [57.64826450787237]
We show how to analyze the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions.<n>We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm.<n> Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
arXiv Detail & Related papers (2024-07-19T08:29:12Z) - Learning causal graphs using variable grouping according to ancestral relationship [7.126300090990439]
When the sample size is small relative to the number of variables, the accuracy of estimating causal graphs using existing methods decreases.
Some methods are not feasible when the sample size is smaller than the number of variables.
To circumvent these problems, some researchers proposed causal structure learning algorithms using divide-and-conquer approaches.
arXiv Detail & Related papers (2024-03-21T04:42:04Z) - Performance Evaluation and Comparison of a New Regression Algorithm [4.125187280299247]
We compare the performance of a newly proposed regression algorithm against four conventional machine learning algorithms.
The reader is free to replicate our results since we have provided the source code in a GitHub repository.
arXiv Detail & Related papers (2023-06-15T13:01:16Z) - Reinforcement Learning for Branch-and-Bound Optimisation using
Retrospective Trajectories [72.15369769265398]
Machine learning has emerged as a promising paradigm for branching.
We propose retro branching; a simple yet effective approach to RL for branching.
We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables.
arXiv Detail & Related papers (2022-05-28T06:08:07Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Streaming Linear System Identification with Reverse Experience Replay [45.17023170054112]
We consider the problem of estimating a linear time-invariant (LTI) dynamical system from a single trajectory via streaming algorithms.
In many problems of interest as encountered in reinforcement learning (RL), it is important to estimate the parameters on the go using gradient oracle.
We propose a novel, SGD with Reverse Experience Replay (SGD-RER), that is inspired by the experience replay (ER) technique popular in the RL literature.
arXiv Detail & Related papers (2021-03-10T06:51:55Z) - Inception Convolution with Efficient Dilation Search [121.41030859447487]
Dilation convolution is a critical mutant of standard convolution neural network to control effective receptive fields and handle large scale variance of objects.
We propose a new mutant of dilated convolution, namely inception (dilated) convolution where the convolutions have independent dilation among different axes, channels and layers.
We explore a practical method for fitting the complex inception convolution to the data, a simple while effective dilation search algorithm(EDO) based on statistical optimization is developed.
arXiv Detail & Related papers (2020-12-25T14:58:35Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.