Beyond Greenfield: The D3 Framework for AI-Driven Productivity in Brownfield Engineering
- URL: http://arxiv.org/abs/2512.01155v2
- Date: Tue, 02 Dec 2025 10:47:38 GMT
- Title: Beyond Greenfield: The D3 Framework for AI-Driven Productivity in Brownfield Engineering
- Authors: Krishna Kumaar Sharma,
- Abstract summary: Brownfield engineering work involving legacy systems, incomplete documentation, and fragmented architectural knowledge poses unique challenges for the effective use of large language models (LLMs)<n>This paper introduces the Discover-Define-Deliver (D3) Framework, a disciplined LLM-assisted workflow that combines role-separated prompting strategies with applied best practices for navigating ambiguity in brownfield systems.<n> Respondents reported perceived improvements in clarity, quality, documentation and cognitive load, along with self-estimated productivity gains.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Brownfield engineering work involving legacy systems, incomplete documentation, and fragmented architectural knowledge poses unique challenges for the effective use of large language models (LLMs). Prior research has largely focused on greenfield or synthetic tasks, leaving a gap in structured workflows for complex, context-heavy environments. This paper introduces the Discover-Define-Deliver (D3) Framework, a disciplined LLM-assisted workflow that combines role-separated prompting strategies with applied best practices for navigating ambiguity in brownfield systems. The framework incorporates a dual-agent prompting architecture in which a Builder model generates candidate outputs and a Reviewer model provides structured critique to improve reliability. I conducted an exploratory survey study with 52 software practitioners who applied the D3 workflow to real-world engineering tasks such as legacy system exploration, documentation reconstruction, and architectural refactoring. Respondents reported perceived improvements in task clarity, documentation quality, and cognitive load, along with self-estimated productivity gains. In this exploratory study, participants reported a weighted average productivity improvement of 26.9%, reduced cognitive load for approximately 77% of participants, and 83% of participants spent less time fixing or rewriting code due to better initial planning with AI. As these findings are self-reported and not derived from controlled experiments, they should be interpreted as preliminary evidence of practitioner sentiment rather than causal effects. The results highlight both the potential and limitations of structured LLM workflows for legacy engineering systems and motivate future controlled evaluations.
Related papers
- KARL: Knowledge Agents via Reinforcement Learning [63.627906947205624]
We present a system for training enterprise search agents via reinforcement learning.<n> KARLBench is a multi-capability evaluation suite spanning six distinct search regimes.<n>We show that models trained across heterogeneous search behaviors generalize substantially better than those optimized for any single benchmark.
arXiv Detail & Related papers (2026-03-05T14:30:25Z) - Architecture-Aware Multi-Design Generation for Repository-Level Feature Addition [53.50448142467294]
RAIM is a multi-design and architecture-aware framework for repository-level feature addition.<n>It shifts away from linear patching by generating multiple diverse implementation designs.<n>Experiments on the NoCode-bench Verified dataset demonstrate that RAIM establishes a new state-of-the-art performance.
arXiv Detail & Related papers (2026-03-02T12:50:40Z) - EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots [68.29056647487519]
Embodied AI is fueled by high-fidelity simulation and large-scale data collection.<n>However, this scaling capability remains bottlenecked by a reliance on labor-intensive manual oversight.<n>We introduce textscEmboCoach-Bench, a benchmark evaluating the capacity of LLM agents to autonomously engineer embodied policies.
arXiv Detail & Related papers (2026-01-29T11:33:49Z) - Quality Assurance of LLM-generated Code: Addressing Non-Functional Quality Characteristics [3.0540716731676625]
Existing studies focus mainly on whether generated code passes the tests rather than whether it passes with quality.<n>This study conducted three complementary investigations: a systematic review of 108 papers, two industry workshops with practitioners from multiple organizations, and an empirical analysis of patching real-world software issues.<n>We found that security and performance efficiency dominate academic attention, while maintainability and other qualities are understudied.
arXiv Detail & Related papers (2025-11-13T12:56:07Z) - Benchmarking and Studying the LLM-based Agent System in End-to-End Software Development [33.01897134024342]
Development of LLM-based autonomous agents for end-to-end software development represents a significant paradigm shift in software engineering.<n>This work provides the community with a more realistic benchmark, a comprehensive evaluation framework, and crucial insights into the current capabilities and core challenges of software development agents.
arXiv Detail & Related papers (2025-11-06T05:10:04Z) - VeriOpt: PPA-Aware High-Quality Verilog Generation via Multi-Role LLMs [41.94295877935867]
VeriOpt is a novel framework that leverages role-based prompting and PPA-aware optimization to produce high-quality, synthesizable Verilog.<n>Our work advances the state-of-the-art AI-driven hardware design by addressing the critical gap between correctness and quality.
arXiv Detail & Related papers (2025-07-20T00:28:55Z) - The Impact of LLM-Assistants on Software Developer Productivity: A Systematic Literature Review [4.503986781849658]
Large language model assistants (LLM-assistants) present new opportunities to transform software development.<n>Despite growing interest, there is no synthesis of how LLM-assistants affect software developer productivity.<n>Our analysis reveals that LLM-assistants offer both considerable benefits and critical risks.
arXiv Detail & Related papers (2025-07-03T20:25:49Z) - ORMind: A Cognitive-Inspired End-to-End Reasoning Framework for Operations Research [56.961539386979354]
We introduce ORMind, a cognitive-inspired framework that enhances optimization through counterfactual reasoning.<n>Our approach emulates human cognition, implementing an end-to-end workflow that transforms requirements into mathematical models and executable code.<n>It is currently being tested internally in Lenovo's AI Assistant, with plans to enhance optimization capabilities for both business and consumer customers.
arXiv Detail & Related papers (2025-06-02T05:11:21Z) - Evaluating Large Language Models for Real-World Engineering Tasks [75.97299249823972]
This paper introduces a curated database comprising over 100 questions derived from authentic, production-oriented engineering scenarios.<n>Using this dataset, we evaluate four state-of-the-art Large Language Models (LLMs)<n>Our results show that LLMs demonstrate strengths in basic temporal and structural reasoning but struggle significantly with abstract reasoning, formal modeling, and context-sensitive engineering logic.
arXiv Detail & Related papers (2025-05-12T14:05:23Z) - Assessing LLMs for Front-end Software Architecture Knowledge [0.0]
Large Language Models (LLMs) have demonstrated significant promise in automating software development tasks.<n>This study investigates the capabilities of an LLM in understanding, reproducing, and generating structures within the VIPER architecture.<n> Experimental results, using ChatGPT 4 Turbo 2024-04-09, reveal that the LLM excelled in higher-order tasks like evaluating and creating, but faced challenges with lower-order tasks requiring precise retrieval of architectural details.
arXiv Detail & Related papers (2025-02-26T19:33:35Z) - Unveiling and Consulting Core Experts in Retrieval-Augmented MoE-based LLMs [64.9693406713216]
Internal mechanisms that contribute to the effectiveness of RAG systems remain underexplored.
Our experiments reveal that several core groups of experts are primarily responsible for RAG-related behaviors.
We propose several strategies to enhance RAG's efficiency and effectiveness through expert activation.
arXiv Detail & Related papers (2024-10-20T16:08:54Z) - MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning [17.437573206368494]
Visual deep reinforcement learning (RL) enables robots to acquire skills from visual input for unstructured tasks.<n>We present MENTOR, a method that improves both the architecture and optimization of RL agents.<n>MenTOR outperforms state-of-the-art methods across three simulation benchmarks and achieves an average of 83% success rate on three challenging real-world robotic manipulation tasks.
arXiv Detail & Related papers (2024-10-19T04:31:54Z) - Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives [54.14429346914995]
Chain-of-Thought (CoT) has become a pivotal method for solving complex problems with large language models (LLMs)<n>This paper introduces the Re-TASK framework, a novel theoretical model that revisits LLM tasks from capability, skill, and knowledge perspectives.<n> Experiments across diverse domains demonstrate the effectiveness of Re-TASK.
arXiv Detail & Related papers (2024-08-13T13:58:23Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.