A Hybrid Approach for EMF Code Generation:Code Templates Meet Large Language Models
- URL: http://arxiv.org/abs/2512.05498v1
- Date: Fri, 05 Dec 2025 07:46:51 GMT
- Title: A Hybrid Approach for EMF Code Generation:Code Templates Meet Large Language Models
- Authors: Xiao He, Ru Chen, Zeqing Zhang, Yanling Wang, Qiuyan Dong,
- Abstract summary: iEcoreGen is a hybrid approach that integrates Eclipse Modeling Framework (EMF) and LLMs.<n>iEcoreGen decomposes requirements to derive operation specifications, uses EMF's template-based generator to produce initial Java code, and serializes specifications into docstrings.
- Score: 7.176489635263981
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Template-based and LLM-based code generation are both key enablers of automated software development. The former provides correctness guarantees but are rigid for complex requirements, whereas LLMs offer high flexibility at the risk of producing faulty code.This paper proposes iEcoreGen, a hybrid approach that integrates Eclipse Modeling Framework (EMF) and LLMs. In EMF, an Ecore model defines a system structure and acts as a blueprint for code-generation.iEcoreGen decomposes requirements to derive operation specifications, uses EMF's template-based generator to produce initial Java code, and serializes specifications into docstrings. LLMs are then invoked to complete and fix unimplemented methods. We assessed iEcoreGen on twenty code-generation tasks across five LLMs. It surpasses LLM-only baselines on pass@k and performs on par with them on compilation@k. An ablation study clarified the contribution of each component of iEcoreGen. Overall, the findings indicate that LLM-enhanced model-driven development is a promising path toward more efficient software automation.
Related papers
- Automated Customization of LLMs for Enterprise Code Repositories Using Semantic Scopes [3.2942861117920916]
We present our approach for automated LLM customization based on semantic scopes in the code.<n>Our mechanism for ingesting the repository's data and formulating the training data pairs with semantic scopes helps models to learn the underlying patterns specific to the repository.<n>The code completions of moderately sized customized models can be significantly better than those of uncustomized models of much larger capacity.
arXiv Detail & Related papers (2026-02-05T15:38:54Z) - ThinkGen: Generalized Thinking for Visual Generation [97.19923474851987]
ThinkGen is a think-driven visual generation framework that explicitly leverages Chain-of-Thought (CoT) reasoning in various generation scenarios.<n>We propose a separable GRPO-based training paradigm, alternating reinforcement learning between the MLLM and DiT modules.<n>Experiments demonstrate that ThinkGen achieves robust, state-of-the-art performance across multiple generation benchmarks.
arXiv Detail & Related papers (2025-12-29T16:08:50Z) - Aligning Requirement for Large Language Model's Code Generation [9.205909320363247]
Specine is a novel specification alignment technique for large language models (LLMs) code generation.<n>Its key idea is to identify misaligned input specifications, lift LLM-perceived specifications, and align them to enhance the code generation performance of LLMs.<n>For example, Specine outperforms the most effective baseline, achieving an average improvement of 29.60% across all subjects in terms of Pass@1.
arXiv Detail & Related papers (2025-09-01T09:56:13Z) - HiVeGen -- Hierarchical LLM-based Verilog Generation for Scalable Chip Design [24.46771930751068]
HiVeGen is a hierarchical Verilog generation framework that decomposes generation tasks into hierarchical submodules.<n> automatic Design Space Exploration (DSE) into hierarchy-aware prompt generation, introducing weight-based retrieval to enhance code reuse.<n>Real-time human-computer interaction to lower error-correction cost, significantly improving the quality of generated designs.
arXiv Detail & Related papers (2024-12-06T19:37:53Z) - OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [76.59316249991657]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.<n>While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.<n>We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - Automatic Generation of Benchmarks and Reliable LLM Judgment for Code Tasks [0.8274693573069442]
This work introduces a methodology to generate and evaluate LaaJ implementations, utilizing an automatically generated benchmark.
The benchmark is used both to develop and validate the LaaJs and to validate and test the LLM code related solution using the LaaJs.
Our approach enables the creation of high quality code task solutions.
arXiv Detail & Related papers (2024-10-28T14:34:36Z) - LLM as a code generator in Agile Model Driven Development [0.032771631221674334]
This research champions Model Driven Development (MDD) as a viable strategy to overcome these challenges.<n>We propose an Agile Model Driven Development (AMDD) approach that employs GPT4 as a code generator.<n>Applying GPT4 auto generation capabilities yields Java and Python code that is compatible with the JADE and PADE frameworks.
arXiv Detail & Related papers (2024-10-24T07:24:11Z) - Open-domain Implicit Format Control for Large Language Model Generation [52.83173553689678]
We introduce a novel framework for controlled generation in large language models (LLMs)
This study investigates LLMs' capabilities to follow open-domain, one-shot constraints and replicate the format of the example answers.
We also develop a dataset collection methodology for supervised fine-tuning that enhances the open-domain format control of LLMs without degrading output quality.
arXiv Detail & Related papers (2024-08-08T11:51:45Z) - UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback [21.858896845159208]
Large language models (LLMs) struggle to consistently generate UI code that compiles and produces visually relevant designs.
Existing approaches to improve generation rely on expensive human feedback or distilling a proprietary model.
Our method starts with an existing LLM and iteratively produces improved models by self-generating a large synthetic dataset.
arXiv Detail & Related papers (2024-06-11T21:53:46Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - CodeTF: One-stop Transformer Library for State-of-the-art Code LLM [72.1638273937025]
We present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.
Our library supports a collection of pretrained Code LLM models and popular code benchmarks.
We hope CodeTF is able to bridge the gap between machine learning/generative AI and software engineering.
arXiv Detail & Related papers (2023-05-31T05:24:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.