What's DAT? Three Case Studies of Measuring Software Development Productivity at Meta With Diff Authoring Time
- URL: http://arxiv.org/abs/2503.10977v1
- Date: Fri, 14 Mar 2025 00:50:12 GMT
- Title: What's DAT? Three Case Studies of Measuring Software Development Productivity at Meta With Diff Authoring Time
- Authors: Moritz Beller, Amanda Park, Karim Nakad, Akshay Patel, Sarita Mohanty, Ford Garberson, Ian G. Malone, Vaishali Garg, Henri Verroken, Andrew Kennedy, Pavel Avgustinov,
- Abstract summary: Diff Authoring Time ( DAT) is a powerful, yet conceptually simple approach to measuring software development productivity.<n>We validate DAT through observational studies, surveys, visualizations, and descriptive statistics.<n> DAT offers a precise, yet high-coverage measure for development productivity, aiding business decisions.
- Score: 1.1023377024290713
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper introduces Diff Authoring Time (DAT), a powerful, yet conceptually simple approach to measuring software development productivity that enables rigorous experimentation. DAT is a time based metric, which assess how long engineers take to develop changes, using a privacy-aware telemetry system integrated with version control, the IDE, and the OS. We validate DAT through observational studies, surveys, visualizations, and descriptive statistics. At Meta, DAT has powered experiments and case studies on more than 20 projects. Here, we highlight (1) an experiment on introducing mock types (a 14% DAT improvement), (2) the development of automatic memoization in the React compiler (33% improvement), and (3) an estimate of thousands of DAT hours saved annually through code sharing (> 50% improvement). DAT offers a precise, yet high-coverage measure for development productivity, aiding business decisions. It enhances development efficiency by aligning the internal development workflow with the experiment-driven culture of external product development. On the research front, DAT has enabled us to perform rigorous experimentation on long-standing software engineering questions such as "do types make development more efficient?"
Related papers
- Data Scaling Laws for End-to-End Autonomous Driving [83.85463296830743]
We evaluate the performance of a simple end-to-end driving architecture on internal driving datasets ranging in size from 16 to 8192 hours.
Specifically, we investigate how much additional training data is needed to achieve a target performance gain.
arXiv Detail & Related papers (2025-04-06T03:23:48Z) - Analyzing DevOps Practices Through Merge Request Data: A Case Study in Networking Software Company [2.5999037208435705]
GitLab's Request (MR) mechanism streamlines code submission and review.
MR data reflects broader aspects, including collaboration patterns, productivity, and process optimization.
This study examines 26.7k MRs from four teams across 116 projects of a networking software company.
arXiv Detail & Related papers (2025-03-18T19:33:34Z) - Towards Decoding Developer Cognition in the Age of AI Assistants [9.887133861477233]
We propose a controlled observational study combining physiological measurements (EEG and eye tracking) with interaction data to examine developers' use of AI-assisted programming tools.<n>We will recruit professional developers to complete programming tasks both with and without AI assistance while measuring their cognitive load and task completion time.
arXiv Detail & Related papers (2025-01-05T23:25:21Z) - Which Combination of Test Metrics Can Predict Success of a Software Project? A Case Study in a Year-Long Project Course [1.553083901660282]
Testing plays an important role in securing the success of a software development project.
We investigate whether we can quantify the effects various types of testing have on functional suitability.
arXiv Detail & Related papers (2024-08-22T04:23:51Z) - Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - Experiential Co-Learning of Software-Developing Agents [83.34027623428096]
Large language models (LLMs) have brought significant changes to various domains, especially in software development.
We introduce Experiential Co-Learning, a novel LLM-agent learning framework.
Experiments demonstrate that the framework enables agents to tackle unseen software-developing tasks more effectively.
arXiv Detail & Related papers (2023-12-28T13:50:42Z) - Integration of Domain Expert-Centric Ontology Design into the CRISP-DM for Cyber-Physical Production Systems [45.05372822216111]
Methods from Machine Learning (ML) and Data Mining (DM) have proven to be promising in extracting complex and hidden patterns from the data collected.
However, such data-driven projects, usually performed with the Cross-Industry Standard Process for Data Mining (CRISPDM), often fail due to the disproportionate amount of time needed for understanding and preparing the data.
This contribution intends present an integrated approach so that data scientists are able to more quickly and reliably gain insights into the CPPS challenges.
arXiv Detail & Related papers (2023-07-21T15:04:00Z) - Productivity Assessment of Neural Code Completion [4.821593904732654]
We ask users of GitHub Copilot about its impact on their productivity, and seek to find a reflection of their perception in directly measurable user data.
We find that the rate with which shown suggestions are accepted, rather than more specific metrics regarding the persistence of completions in the code over time, drives developers' perception of productivity.
arXiv Detail & Related papers (2022-05-13T09:53:25Z) - Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data
Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z) - Enabling Reproducibility and Meta-learning Through a Lifelong Database
of Experiments (LDE) [0.43012765978447565]
We present the Lifelong Database of Experiments (LDE) that automatically extracts and stores linked metadata from experiment artifacts.
We store context from multiple stages of the AI development lifecycle including datasets, pipelines, how each is configured, and training runs with information about their runtime environment.
We perform two experiments on this metadata: 1) examining the variability of the performance metrics and 2) implementing a number of meta-learning algorithms on top of the data.
arXiv Detail & Related papers (2022-02-22T15:35:16Z) - Lifelong Learning Metrics [63.8376359764052]
The DARPA Lifelong Learning Machines (L2M) program seeks to yield advances in artificial intelligence (AI) systems.
This document outlines a formalism for constructing and characterizing the performance of agents performing lifelong learning scenarios.
arXiv Detail & Related papers (2022-01-20T16:29:14Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.