WhatsCode: Large-Scale GenAI Deployment for Developer Efficiency at WhatsApp
- URL: http://arxiv.org/abs/2512.05314v1
- Date: Thu, 04 Dec 2025 23:25:06 GMT
- Title: WhatsCode: Large-Scale GenAI Deployment for Developer Efficiency at WhatsApp
- Authors: Ke Mao, Timotej Kapus, Cons T Åhs, Matteo Marescotti, Daniel Ip, Ákos Hajdu, Sopot Cela, Aparup Banerjee,
- Abstract summary: Report on the industrial deployment of WhatsCode, a domain-specific AI development system that supports WhatsApp.<n>WhatsCode evolved from targeted privacy automation to autonomous agentic integrated with end-to-end feature development and DevOps processes.<n>System committed 692 automated/fix changes, 711 framework adoptions, 141 feature development assists and maintained precision in bug triage.
- Score: 0.8197659035200293
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The deployment of AI-assisted development tools in compliance-relevant, large-scale industrial environments represents significant gaps in academic literature, despite growing industry adoption. We report on the industrial deployment of WhatsCode, a domain-specific AI development system that supports WhatsApp (serving over 2 billion users) and processes millions of lines of code across multiple platforms. Over 25 months (2023-2025), WhatsCode evolved from targeted privacy automation to autonomous agentic workflows integrated with end-to-end feature development and DevOps processes. WhatsCode achieved substantial quantifiable impact, improving automated privacy verification coverage 3.5x from 15% to 53%, identifying privacy requirements, and generating over 3,000 accepted code changes with acceptance rates ranging from 9% to 100% across different automation domains. The system committed 692 automated refactor/fix changes, 711 framework adoptions, 141 feature development assists and maintained 86% precision in bug triage. Our study identifies two stable human-AI collaboration patterns that emerged from production deployment: one-click rollout for high-confidence changes (60% of cases) and commandeer-revise for complex decisions (40%). We demonstrate that organizational factors, such as ownership models, adoption dynamics, and risk management, are as decisive as technical capabilities for enterprise-scale AI success. The findings provide evidence-based guidance for large-scale AI tool deployment in compliance-relevant environments, showing that effective human-AI collaboration, not full automation, drives sustainable business impact.
Related papers
- SWE-Universe: Scale Real-World Verifiable Environments to Millions [84.63665266236963]
SWE-Universe is a framework for automatically constructing real-world software engineering (SWE) verifiable environments from GitHub pull requests (PRs)<n>We propose a building agent powered by an efficient custom-trained model to overcome the prevalent challenges of automatic building.<n>We demonstrate the profound value of our environments through large-scale agentic mid-training and reinforcement learning.
arXiv Detail & Related papers (2026-02-02T17:20:30Z) - EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots [68.29056647487519]
Embodied AI is fueled by high-fidelity simulation and large-scale data collection.<n>However, this scaling capability remains bottlenecked by a reliance on labor-intensive manual oversight.<n>We introduce textscEmboCoach-Bench, a benchmark evaluating the capacity of LLM agents to autonomously engineer embodied policies.
arXiv Detail & Related papers (2026-01-29T11:33:49Z) - Towards a Science of Scaling Agent Systems [79.64446272302287]
We formalize a definition for agent evaluation and characterize scaling laws as the interplay between agent quantity, coordination structure, modelic, and task properties.<n>We derive a predictive model using coordination metrics, that cross-validated R2=0, enabling prediction on unseen task domains.<n>We identify three effects: (1) a tool-coordination trade-off: under fixed computational budgets, tool-heavy tasks suffer disproportionately from multi-agent overhead, and (2) a capability saturation: coordination yields diminishing or negative returns once single-agent baselines exceed 45%.
arXiv Detail & Related papers (2025-12-09T06:52:21Z) - Beyond Prototyping: Autonomous, Enterprise-Grade Frontend Development from Pixel to Production via a Specialized Multi-Agent Framework [0.01059638456503418]
We present AI4UI, a framework of autonomous front-end development agents purpose-built to meet the rigorous requirements of enterprise-grade application delivery.<n>Unlike general-purpose code assistants designed for rapid prototyping, AI4UI focuses on production readiness delivering secure, scalable, compliant, and maintainable UI code integrated seamlessly into enterprise.
arXiv Detail & Related papers (2025-12-05T09:56:15Z) - Intuition to Evidence: Measuring AI's True Impact on Developer Productivity [30.02516976149379]
We present a comprehensive real-world evaluation of AI-assisted software development tools deployed at enterprise scale.<n>Over one year, 300 engineers across multiple teams integrated an in-house AI platform (DeputyDev) that combines code generation and automated review capabilities.
arXiv Detail & Related papers (2025-09-24T02:34:11Z) - LIMI: Less is More for Agency [49.63355240818081]
LIMI (Less Is More for Intelligent Agency) demonstrates that agency follows radically different development principles.<n>We show that sophisticated agentic intelligence can emerge from minimal but strategically curated demonstrations of autonomous behavior.<n>Our findings establish the Agency Efficiency Principle: machine autonomy emerges not from data abundance but from strategic curation of high-quality agentic demonstrations.
arXiv Detail & Related papers (2025-09-22T10:59:32Z) - OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks [52.87238755666243]
We present OmniEAR, a framework for evaluating how language models reason about physical interactions, tool usage, and multi-agent coordination in embodied tasks.<n>We model continuous physical properties and complex spatial relationships across 1,500 scenarios spanning household and industrial domains.<n>Our systematic evaluation reveals severe performance degradation when models must reason from constraints.
arXiv Detail & Related papers (2025-08-07T17:54:15Z) - Co-Saving: Resource Aware Multi-Agent Collaboration for Software Development [65.94639060883475]
We propose a resource-aware multi-agent system -- Co-Saving.<n>Our key innovation is the introduction of "shortcuts"<n>Compared to the state-of-the-art MAS ChatDev, our method achieves an average reduction of 50.85% in token usage.
arXiv Detail & Related papers (2025-05-28T02:23:53Z) - Intelligent Mobile AI-Generated Content Services via Interactive Prompt Engineering and Dynamic Service Provisioning [55.641299901038316]
AI-generated content can organize collaborative Mobile AIGC Service Providers (MASPs) at network edges to provide ubiquitous and customized content for resource-constrained users.<n>Such a paradigm faces two significant challenges: 1) raw prompts often lead to poor generation quality due to users' lack of experience with specific AIGC models, and 2) static service provisioning fails to efficiently utilize computational and communication resources.<n>We develop an interactive prompt engineering mechanism that leverages a Large Language Model (LLM) to generate customized prompt corpora and employs Inverse Reinforcement Learning (IRL) for policy imitation.
arXiv Detail & Related papers (2025-02-17T03:05:20Z) - EmbedGenius: Towards Automated Software Development for Generic Embedded IoT Systems [11.524778651869044]
This paper introduces EmbedGenius, the first fully automated software development platform for general-purpose embedded IoT systems.<n>The key idea is to leverage the reasoning ability of Large Language Models (LLMs) and embedded system expertise to automate the hardware-in-the-loop development process.<n>We evaluate EmbedGenius's performance across 71 modules and four mainstream embedded development platforms with over 350 IoT tasks.
arXiv Detail & Related papers (2024-12-12T08:34:12Z) - Improving Performance of Commercially Available AI Products in a Multi-Agent Configuration [11.626057561212694]
Crowdbotics PRD AI is a tool for generating software requirements using AI.
GitHub Copilot is an AI pair-programming tool.
By sharing business requirements from PRD AI, we improve the code suggestion capabilities of GitHub Copilot by 13.8% and developer task success rate by 24.5%.
arXiv Detail & Related papers (2024-10-29T15:28:19Z) - AutoGLM: Autonomous Foundation Agents for GUIs [51.276965515952]
We present AutoGLM, a new series in the ChatGLM family, designed to serve as foundation agents for autonomous control of digital devices through Graphical User Interfaces (GUIs)
We have developed AutoGLM as a practical foundation agent system for real-world GUI interactions.
Our evaluations demonstrate AutoGLM's effectiveness across multiple domains.
arXiv Detail & Related papers (2024-10-28T17:05:10Z) - From Today's Code to Tomorrow's Symphony: The AI Transformation of Developer's Routine by 2030 [3.437372707846067]
We provide a comparative analysis between the current state of AI-assisted programming in 2024 and our projections for 2030.
We envision HyperAssistant, an augmented AI tool that offers comprehensive support to 2030 developers.
arXiv Detail & Related papers (2024-05-21T12:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.