Assessing GPT-4-Vision's Capabilities in UML-Based Code Generation
- URL: http://arxiv.org/abs/2404.14370v1
- Date: Mon, 22 Apr 2024 17:21:24 GMT
- Title: Assessing GPT-4-Vision's Capabilities in UML-Based Code Generation
- Authors: Gábor Antal, Richárd Vozár, Rudolf Ferenc,
- Abstract summary: GPT-4-Vision is a state-of-the-art deep learning model.
It can transform Unified Modeling Language (UML) class diagrams into fully operating Java class files.
- Score: 0.5789654849162464
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The emergence of advanced neural networks has opened up new ways in automated code generation from conceptual models, promising to enhance software development processes. This paper presents a preliminary evaluation of GPT-4-Vision, a state-of-the-art deep learning model, and its capabilities in transforming Unified Modeling Language (UML) class diagrams into fully operating Java class files. In our study, we used exported images of 18 class diagrams comprising 10 single-class and 8 multi-class diagrams. We used 3 different prompts for each input, and we manually evaluated the results. We created a scoring system in which we scored the occurrence of elements found in the diagram within the source code. On average, the model was able to generate source code for 88% of the elements shown in the diagrams. Our results indicate that GPT-4-Vision exhibits proficiency in handling single-class UML diagrams, successfully transforming them into syntactically correct class files. However, for multi-class UML diagrams, the model's performance is weaker compared to single-class diagrams. In summary, further investigations are necessary to exploit the model's potential completely.
Related papers
- TULIP: Towards Unified Language-Image Pretraining [60.99500935831526]
We introduce T, an open-source, drop-in replacement for existing CLIP-like models.
Our method leverages generative data augmentation, enhanced image-image and text-text contrastive learning, and image/text reconstruction regularization to learn fine-grained visual features.
Our approach, scaling to over 1B parameters, outperforms existing state-of-the-art (SOTA) models across benchmarks.
arXiv Detail & Related papers (2025-03-19T17:58:57Z) - Unified Modeling Language Code Generation from Diagram Images Using Multimodal Large Language Models [0.41942958779358674]
This paper proposes a new approach to generate code using a large multimodal language model automatically.
domain-adapted MM-LLMs perform for code generation automation, whereby at the best model, it achieved BLEU and SSIM scores of 0.779 and 0.942 on sequence diagrams.
arXiv Detail & Related papers (2025-03-15T23:20:26Z) - AutoPresent: Designing Structured Visuals from Scratch [99.766901203884]
We benchmark end-to-end image generation and program generation methods with a variety of models.
We create AutoPresent, an 8B Llama-based model trained on 7k pairs of instructions paired with code for slide generation.
arXiv Detail & Related papers (2025-01-01T18:09:32Z) - Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction [52.09472099976885]
IAR is an Improved AutoRegressive Visual Generation Method that enhances the training efficiency and generation quality of LLM-based visual generation models.
Our method consistently enhances the model training efficiency and performance from 100M to 1.4B, reducing the training time by half while achieving the same FID.
arXiv Detail & Related papers (2025-01-01T15:58:51Z) - Assessing UML Models by ChatGPT: Implications for Education [9.11195766839205]
In software engineering (SE) research and practice, is well known as an essential modeling methodology.
Recent advancements in generative AI techniques, such as ChatGPT, have paved new ways to automate many SE tasks.
This paper aims to investigate the feasibility and effectiveness of ChatGPT in assessing the quality of models.
arXiv Detail & Related papers (2024-12-23T00:28:33Z) - Language Models are Graph Learners [70.14063765424012]
Language Models (LMs) are challenging the dominance of domain-specific models, including Graph Neural Networks (GNNs) and Graph Transformers (GTs)
We propose a novel approach that empowers off-the-shelf LMs to achieve performance comparable to state-of-the-art GNNs on node classification tasks.
arXiv Detail & Related papers (2024-10-03T08:27:54Z) - SynChart: Synthesizing Charts from Language Models [50.73888371511983]
This work explores the potential of using LLMs alone for data generation and develop competitive multi-modality models focusing on chart understanding.
We construct a large-scale chart dataset, SynChart, which contains approximately 4 million diverse chart images with over 75 million dense annotations.
We trained a 4.2B chart-expert model using this dataset and achieve near-GPT-4O performance on the ChartQA task, surpassing GPT-4V.
arXiv Detail & Related papers (2024-09-25T00:18:12Z) - Enabling Small Models for Zero-Shot Classification through Model Label Learning [50.68074833512999]
We introduce a novel paradigm, Model Label Learning (MLL), which bridges the gap between models and their functionalities.
Experiments on seven real-world datasets validate the effectiveness and efficiency of MLL.
arXiv Detail & Related papers (2024-08-21T09:08:26Z) - From Image to UML: First Results of Image Based UML Diagram Generation Using LLMs [1.961305559606562]
In software engineering processes, systems are first specified using a modeling language.
Large Language Models (LLM) are used to generate the formal representation of (UML) models from a given drawing.
More specifically, we have evaluated the capabilities of different LLMs to convert images of class diagrams into the actual models represented in the images.
arXiv Detail & Related papers (2024-04-17T13:33:11Z) - Model Generation with LLMs: From Requirements to UML Sequence Diagrams [9.114284818139069]
This paper investigates the capability of ChatGPT to generate a specific type of model, i.e., sequence diagrams, from NL requirements.
We examine the sequence diagrams generated by ChatGPT for 28 requirements documents of various types and from different domains.
Our results indicate that, although the models generally conform to the standard and exhibit a reasonable level of understandability, their completeness and correctness with respect to the specified requirements often present challenges.
arXiv Detail & Related papers (2024-04-09T15:07:25Z) - CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning [61.21923643289266]
Chain of Manipulations is a mechanism that enables Vision-Language Models to solve problems step-by-step with evidence.
After training, models can solve various visual problems by eliciting intrinsic manipulations (e.g., grounding, zoom in) actively without involving external tools.
Our trained model, textbfCogCoM, achieves state-of-the-art performance across 9 benchmarks from 4 categories.
arXiv Detail & Related papers (2024-02-06T18:43:48Z) - Vi(E)va LLM! A Conceptual Stack for Evaluating and Interpreting
Generative AI-based Visualizations [1.709620026135923]
Large language models (LLM) have become an interesting option for supporting generative tasks related to visualization.
This paper copes with the problem of modeling the evaluation of a generated visualization through an LLM.
We propose a theoretical evaluation stack, EvaLLM, that decomposes the evaluation effort in its atomic components.
arXiv Detail & Related papers (2024-02-03T14:28:55Z) - Class-level Structural Relation Modelling and Smoothing for Visual
Representation Learning [12.247343963572732]
This paper presents a framework termed bfClass-level Structural Relation Modeling and Smoothing for Visual Representation Learning (CSRMS)
It includes the Class-level Relation Modelling, Class-aware GraphGuided Sampling, and Graph-Guided Representation Learning modules.
Experiments demonstrate the effectiveness of structured knowledge modelling for enhanced representation learning and show that CSRMS can be incorporated with any state-of-the-art visual representation learning models for performance gains.
arXiv Detail & Related papers (2023-08-08T09:03:46Z) - Multi-View Class Incremental Learning [57.14644913531313]
Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance.
This paper investigates a novel paradigm called multi-view class incremental learning (MVCIL), where a single model incrementally classifies new classes from a continual stream of views.
arXiv Detail & Related papers (2023-06-16T08:13:41Z) - Visual Instruction Tuning [79.70923292053097]
We present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data.
By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant.
When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieves a new state-of-the-art accuracy of 92.53%.
arXiv Detail & Related papers (2023-04-17T17:59:25Z) - Attribute-Modulated Generative Meta Learning for Zero-Shot
Classification [52.64680991682722]
We present the Attribute-Modulated generAtive meta-model for Zero-shot learning (AMAZ)
Our model consists of an attribute-aware modulation network and an attribute-augmented generative network.
Our empirical evaluations show that AMAZ improves state-of-the-art methods by 3.8% and 5.1% in ZSL and generalized ZSL settings, respectively.
arXiv Detail & Related papers (2021-04-22T04:16:43Z) - Classification of Reverse-Engineered Class Diagram and
Forward-Engineered Class Diagram using Machine Learning [0.0]
In software industry it is important to know which type of class diagram it is.
Which diagram was used in a particular project is an important factor to be known?
We propose to solve this problem by using a supervised Machine Learning technique.
arXiv Detail & Related papers (2020-11-14T14:56:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.