Related papers: Predicting Defective Visual Code Changes in a Multi-Language AAA Video Game Project

Predicting Defective Visual Code Changes in a Multi-Language AAA Video Game Project

URL: http://arxiv.org/abs/2309.03414v1
Date: Thu, 7 Sep 2023 00:18:43 GMT
Title: Predicting Defective Visual Code Changes in a Multi-Language AAA Video Game Project
Authors: Kalvin Eng, Abram Hindle, Alexander Senchenko
Abstract summary: We focus on constructing visual code defect prediction models that encompass visual code metrics. We test our models using features extracted from the historical agnostic of a AAA video game project. We find that defect prediction models have better performance overall in terms of the area under the ROC curve.
Score: 54.20154707138088
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video game development increasingly relies on using visual programming languages as the primary way to build video game features. The aim of using visual programming is to move game logic into the hands of game designers, who may not be as well versed in textual coding. In this paper, we empirically observe that there are more defect-inducing commits containing visual code than textual code in a AAA video game project codebase. This indicates that the existing textual code Just-in-Time (JIT) defect prediction models under evaluation by Electronic Arts (EA) may be ineffective as they do not account for changes in visual code. Thus, we focus our research on constructing visual code defect prediction models that encompass visual code metrics and evaluate the models against defect prediction models that use language agnostic features, and textual code metrics. We test our models using features extracted from the historical codebase of a AAA video game project, as well as the historical codebases of 70 open source projects that use textual and visual code. We find that defect prediction models have better performance overall in terms of the area under the ROC curve (AUC), and Mathews Correlation Coefficient (MCC) when incorporating visual code features for projects that contain more commits with visual code than textual code.

Related papers

Code2Logic: Game-Code-Driven Data Synthesis for Enhancing VLMs General Reasoning [69.7882311630412]
We propose Code2Logic, a novel game-code-driven approach for multimodal reasoning data synthesis.<n>Our approach leverages Large Language Models (LLMs) to adapt game code, enabling automatic acquisition of reasoning processes and results through code execution.
arXiv Detail & Related papers (2025-05-20T03:47:44Z)
LLM Code Customization with Visual Results: A Benchmark on TikZ [6.3303908500560615]
We introduce vTikZ, the first benchmark to evaluate the ability of Large Language Models to customize code while preserving coherent visual outcomes.<n>Our benchmark consists of carefully curated vTikZ editing scenarios, parameterized ground truths, and a reviewing tool that leverages visual feedback to assess correctness.
arXiv Detail & Related papers (2025-05-07T08:26:54Z)
ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges [20.316852491762788]
We propose ScratchEval, a novel benchmark designed to evaluate the visual programming reasoning ability of LMMs. ScratchEval is based on Scratch, a block-based visual programming language widely used in children's programming education.
arXiv Detail & Related papers (2024-11-28T05:51:45Z)
Can OpenSource beat ChatGPT? -- A Comparative Study of Large Language Models for Text-to-Code Generation [0.24578723416255752]
We evaluate five different large language models (LLMs) concerning their capabilities for text-to-code generation. ChatGPT can handle these typical programming challenges by far the most effectively, surpassing even code-specialized models like Code Llama.
arXiv Detail & Related papers (2024-09-06T10:03:49Z)
CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs. CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language. Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z)
MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems [9.56366641717606]
MMCode is the first multi-modal coding dataset for evaluating algorithmic problem-solving skills in visually rich contexts. MMCode contains 3,548 questions and 6,620 images collected from real-world programming challenges.
arXiv Detail & Related papers (2024-04-15T06:15:46Z)
Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning [90.13978453378768]
We introduce a comprehensive typology of factual errors in generated chart captions. A large-scale human annotation effort provides insight into the error patterns and frequencies in captions crafted by various chart captioning models. Our analysis reveals that even state-of-the-art models, including GPT-4V, frequently produce captions laced with factual inaccuracies.
arXiv Detail & Related papers (2023-12-15T19:16:21Z)
Identifying Defect-Inducing Changes in Visual Code [54.20154707138088]
"SZZ Visual Code" (SZZ-VC) is an algorithm that finds changes in visual code based on the differences of graphical elements rather than differences of lines to detect defect-inducing changes. We validated the algorithm for an industry-made AAA video game and 20 music visual programming defects across 12 open source projects.
arXiv Detail & Related papers (2023-09-07T00:12:28Z)
A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems. static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models. We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z)
Code Comment Inconsistency Detection with BERT and Longformer [9.378041196272878]
Comments, or natural language descriptions of source code, are standard practice among software developers. When the code is modified without an accompanying correction to the comment, an inconsistency between the comment and code can arise. We propose two models to detect such inconsistencies in a natural language inference (NLI) context.
arXiv Detail & Related papers (2022-07-29T02:43:51Z)
ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval. We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
Learning to Extend Program Graphs to Work-in-Progress Code [31.235862838381966]
We extend the notion of program graphs to work-in-progress code by learning to predict edge relations between tokens. We consider the tasks of code completion and localizing and repairing variable misuse in a work-in-process scenario. We demonstrate that training relation-aware models with fine-tuned edges consistently leads to improved performance on both tasks.
arXiv Detail & Related papers (2021-05-28T18:12:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.