Related papers: Unified Modeling Language Code Generation from Diagram Images Using Multimodal Large Language Models

Unified Modeling Language Code Generation from Diagram Images Using Multimodal Large Language Models

URL: http://arxiv.org/abs/2503.12293v1
Date: Sat, 15 Mar 2025 23:20:26 GMT
Title: Unified Modeling Language Code Generation from Diagram Images Using Multimodal Large Language Models
Authors: Averi Bates, Ryan Vavricka, Shane Carleton, Ruosi Shao, Chongle Pan,
Abstract summary: This paper proposes a new approach to generate code using a large multimodal language model automatically.<n> domain-adapted MM-LLMs perform for code generation automation, whereby at the best model, it achieved BLEU and SSIM scores of 0.779 and 0.942 on sequence diagrams.
Score: 0.41942958779358674
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Unified Modeling Language is a standardized visual language widely used for modeling and documenting the design of software systems. Although many tools generate UML diagrams from UML code, generating executable UML code from image-based UML diagrams remains challenging. This paper proposes a new approach to generate UML code using a large multimodal language model automatically. Synthetic UML activity and sequence diagram datasets were created to train and test the model. We compared standard fine-tuning with LoRA techniques to optimize base models. The experiments measured code generation accuracy across different model sizes and training strategies. These results demonstrated that domain-adapted MM-LLMs perform for UML code generation automation, whereby, at the best model, it achieved BLEU and SSIM scores of 0.779 and 0.942 on sequence diagrams. This will enable the modernization of legacy systems and decrease the manual effort in software development workflows.

Related papers

LLM-enabled Instance Model Generation [4.52634430160579]
This work explores the generation of instance models using large language models (LLMs) We propose a two-step approach: first, using LLMs to produce a simplified structured output containing all necessary instance model information, and then compiling this intermediate representation into a valid XMI file. Results show that the proposed method significantly improves the usability of LLMs for instance model generation tasks.
arXiv Detail & Related papers (2025-03-28T16:34:29Z)
Assessing UML Models by ChatGPT: Implications for Education [9.11195766839205]
In software engineering (SE) research and practice, is well known as an essential modeling methodology.<n>Recent advancements in generative AI techniques, such as ChatGPT, have paved new ways to automate many SE tasks.<n>This paper aims to investigate the feasibility and effectiveness of ChatGPT in assessing the quality of models.
arXiv Detail & Related papers (2024-12-23T00:28:33Z)
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding [66.74446220401296]
We propose SynerGen-VL, a simple yet powerful encoder-free MLLM capable of both image understanding and generation. We introduce the token folding mechanism and the vision-expert-based progressive alignment pretraining strategy, which effectively support high-resolution image understanding. Our code and models shall be released.
arXiv Detail & Related papers (2024-12-12T18:59:26Z)
Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation [56.75665429851673]
This paper introduces a novel instruction curation algorithm, derived from two unique perspectives, human and LLM preference alignment.<n>Experiments demonstrate that we can maintain or even improve model performance by compressing synthetic multimodal instructions by up to 90%.
arXiv Detail & Related papers (2024-09-27T08:20:59Z)
Assessing GPT-4-Vision's Capabilities in UML-Based Code Generation [0.5789654849162464]
GPT-4-Vision is a state-of-the-art deep learning model. It can transform Unified Modeling Language (UML) class diagrams into fully operating Java class files.
arXiv Detail & Related papers (2024-04-22T17:21:24Z)
From Image to UML: First Results of Image Based UML Diagram Generation Using LLMs [1.961305559606562]
In software engineering processes, systems are first specified using a modeling language. Large Language Models (LLM) are used to generate the formal representation of (UML) models from a given drawing. More specifically, we have evaluated the capabilities of different LLMs to convert images of class diagrams into the actual models represented in the images.
arXiv Detail & Related papers (2024-04-17T13:33:11Z)
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants [65.47222691674074]
Muffin framework employs pre-trained vision-language models to act as providers of visual signals. UniMM-Chat dataset explores the complementarities of datasets to generate 1.1M high-quality and diverse multimodal instructions.
arXiv Detail & Related papers (2023-10-01T12:35:18Z)
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models [50.07056960586183]
We propose Position-enhanced Visual Instruction Tuning (PVIT) to extend the functionality of Multimodal Large Language Models (MLLMs) This integration promotes a more detailed comprehension of images for the MLLM. We present both quantitative experiments and qualitative analysis that demonstrate the superiority of the proposed model.
arXiv Detail & Related papers (2023-08-25T15:33:47Z)
Generating Images with Multimodal Language Models [78.6660334861137]
We propose a method to fuse frozen text-only large language models with pre-trained image encoder and decoder models. Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue.
arXiv Detail & Related papers (2023-05-26T19:22:03Z)
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality [95.76661165594884]
mPLUG-Owl is a training paradigm that equips large language models (LLMs) with multi-modal abilities. The training paradigm involves a two-stage method for aligning image and text, which learns visual knowledge with the assistance of LLM. Experimental results show that our model outperforms existing multi-modal models.
arXiv Detail & Related papers (2023-04-27T13:27:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.