Related papers: From Image to UML: First Results of Image Based UML Diagram Generation Using LLMs

From Image to UML: First Results of Image Based UML Diagram Generation Using LLMs

URL: http://arxiv.org/abs/2404.11376v2
Date: Tue, 18 Jun 2024 08:34:43 GMT
Title: From Image to UML: First Results of Image Based UML Diagram Generation Using LLMs
Authors: Aaron Conrardy, Jordi Cabot,
Abstract summary: In software engineering processes, systems are first specified using a modeling language. Large Language Models (LLM) are used to generate the formal representation of (UML) models from a given drawing. More specifically, we have evaluated the capabilities of different LLMs to convert images of class diagrams into the actual models represented in the images.
Score: 1.961305559606562
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In software engineering processes, systems are first specified using a modeling language such as UML. These initial designs are often collaboratively created, many times in meetings where different domain experts use whiteboards, paper or other types of quick supports to create drawings and blueprints that then will need to be formalized. These proper, machine-readable, models are key to ensure models can be part of automated processes (e.g. input of a low-code generation pipeline, a model-based testing system, ...). But going from hand-drawn diagrams to actual models is a time-consuming process that sometimes ends up with such drawings just added as informal images to the software documentation, reducing their value a lot. To avoid this tedious task, we explore the usage of Large Language Models (LLM) to generate the formal representation of (UML) models from a given drawing. More specifically, we have evaluated the capabilities of different LLMs to convert images of UML class diagrams into the actual models represented in the images. While the results are good enough to use such an approach as part of a model-driven engineering pipeline we also highlight some of their current limitations and the need to keep the human in the loop to overcome those limitations.

Related papers

LLM-enabled Instance Model Generation [4.52634430160579]
This work explores the generation of instance models using large language models (LLMs) We propose a two-step approach: first, using LLMs to produce a simplified structured output containing all necessary instance model information, and then compiling this intermediate representation into a valid XMI file. Results show that the proposed method significantly improves the usability of LLMs for instance model generation tasks.
arXiv Detail & Related papers (2025-03-28T16:34:29Z)
Unified Modeling Language Code Generation from Diagram Images Using Multimodal Large Language Models [0.41942958779358674]
This paper proposes a new approach to generate code using a large multimodal language model automatically. domain-adapted MM-LLMs perform for code generation automation, whereby at the best model, it achieved BLEU and SSIM scores of 0.779 and 0.942 on sequence diagrams.
arXiv Detail & Related papers (2025-03-15T23:20:26Z)
A Model Is Not Built By A Single Prompt: LLM-Based Domain Modeling With Question Decomposition [4.123601037699469]
In real-world domain modeling, engineers usually decompose complex tasks into easily solvable sub-tasks. We propose an LLM-based domain modeling approach via question decomposition, similar to developer's modeling process. Preliminary results show that our approach outperforms the single-prompt-based prompt.
arXiv Detail & Related papers (2024-10-13T14:28:04Z)
Assessing GPT-4-Vision's Capabilities in UML-Based Code Generation [0.5789654849162464]
GPT-4-Vision is a state-of-the-art deep learning model. It can transform Unified Modeling Language (UML) class diagrams into fully operating Java class files.
arXiv Detail & Related papers (2024-04-22T17:21:24Z)
CLAMP: Contrastive LAnguage Model Prompt-tuning [89.96914454453791]
We show that large language models can achieve good image classification performance when adapted this way. Our approach beats state-of-the-art mLLMs by 13% and slightly outperforms contrastive learning with a custom text model.
arXiv Detail & Related papers (2023-12-04T05:13:59Z)
Sequential Modeling Enables Scalable Learning for Large Vision Models [120.91839619284431]
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data. We define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources.
arXiv Detail & Related papers (2023-12-01T18:59:57Z)
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models [50.07056960586183]
We propose Position-enhanced Visual Instruction Tuning (PVIT) to extend the functionality of Multimodal Large Language Models (MLLMs) This integration promotes a more detailed comprehension of images for the MLLM. We present both quantitative experiments and qualitative analysis that demonstrate the superiority of the proposed model.
arXiv Detail & Related papers (2023-08-25T15:33:47Z)
Generating Images with Multimodal Language Models [78.6660334861137]
We propose a method to fuse frozen text-only large language models with pre-trained image encoder and decoder models. Our model demonstrates a wide suite of multimodal capabilities: image retrieval, novel image generation, and multimodal dialogue.
arXiv Detail & Related papers (2023-05-26T19:22:03Z)
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models [62.75006608940132]
This work proposes to enhance prompt understanding capabilities in text-to-image diffusion models. Our method leverages a pretrained large language model for grounded generation in a novel two-stage process. Our method significantly outperforms the base diffusion model and several strong baselines in accurately generating images.
arXiv Detail & Related papers (2023-05-23T03:59:06Z)
Implementing and Experimenting with Diffusion Models for Text-to-Image Generation [0.0]
Two models, DALL-E 2 and Imagen, have demonstrated that highly photorealistic images could be generated from a simple textual description of an image. Text-to-image models require exceptionally large amounts of computational resources to train, as well as handling huge datasets collected from the internet. This thesis contributes by reviewing the different approaches and techniques used by these models, and then by proposing our own implementation of a text-to-image model.
arXiv Detail & Related papers (2022-09-22T12:03:33Z)
Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image. We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively. Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.