Software Metadata Classification based on Generative Artificial
Intelligence
- URL: http://arxiv.org/abs/2310.13006v1
- Date: Sat, 14 Oct 2023 07:38:16 GMT
- Title: Software Metadata Classification based on Generative Artificial
Intelligence
- Authors: Seetharam Killivalavan, Durairaj Thenmozhi
- Abstract summary: This paper presents a novel approach to enhance the performance of binary code comment quality classification models through the application of Generative Artificial Intelligence (AI)
By leveraging the OpenAI API, a dataset comprising 1239 newly generated code-comment pairs has been labelled as "Useful" or "Not Useful"
The results affirm the effectiveness of this methodology, indicating its applicability in broader contexts within software development and quality assurance domains.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents a novel approach to enhance the performance of binary
code comment quality classification models through the application of
Generative Artificial Intelligence (AI). By leveraging the OpenAI API, a
dataset comprising 1239 newly generated code-comment pairs, extracted from
various GitHub repositories and open-source projects, has been labelled as
"Useful" or "Not Useful", and integrated into the existing corpus of 9048 pairs
in the C programming language. Employing a cutting-edge Large Language Model
Architecture, the generated dataset demonstrates notable improvements in model
accuracy. Specifically, when incorporated into the Support Vector Machine (SVM)
model, a 6% increase in precision is observed, rising from 0.79 to 0.85.
Additionally, the Artificial Neural Network (ANN) model exhibits a 1.5%
increase in recall, climbing from 0.731 to 0.746. This paper sheds light on the
potential of Generative AI in augmenting code comment quality classification
models. The results affirm the effectiveness of this methodology, indicating
its applicability in broader contexts within software development and quality
assurance domains. The findings underscore the significance of integrating
generative techniques to advance the accuracy and efficacy of machine learning
models in practical software engineering scenarios.
Related papers
- Enhancing Code Annotation Reliability: Generative AI's Role in Comment Quality Assessment Models [0.0]
We integrate 1,437 newly generated code-comment pairs, labeled as "Useful" or "Not Useful," into an existing C-language dataset of 9,048 pairs.
Our approach yields a 5.78% precision increase in the Support Vector Machine (SVM) model, improving from 0.79 to 0.8478, and a 2.17% recall boost in the Artificial Neural Network (ANN) model, rising from 0.731 to 0.7527.
arXiv Detail & Related papers (2024-10-29T17:57:27Z) - Hybrid-Segmentor: A Hybrid Approach to Automated Fine-Grained Crack Segmentation in Civil Infrastructure [52.2025114590481]
We introduce Hybrid-Segmentor, an encoder-decoder based approach that is capable of extracting both fine-grained local and global crack features.
This allows the model to improve its generalization capabilities in distinguish various type of shapes, surfaces and sizes of cracks.
The proposed model outperforms existing benchmark models across 5 quantitative metrics (accuracy 0.971, precision 0.804, recall 0.744, F1-score 0.770, and IoU score 0.630), achieving state-of-the-art status.
arXiv Detail & Related papers (2024-09-04T16:47:16Z) - Building Math Agents with Multi-Turn Iterative Preference Learning [56.71330214021884]
This paper studies the complementary direct preference learning approach to further improve model performance.
Existing direct preference learning algorithms are originally designed for the single-turn chat task.
We introduce a multi-turn direct preference learning framework, tailored for this context.
arXiv Detail & Related papers (2024-09-04T02:41:04Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - Enhanced Infield Agriculture with Interpretable Machine Learning Approaches for Crop Classification [0.49110747024865004]
This research evaluates four different approaches for crop classification, namely traditional ML with handcrafted feature extraction methods like SIFT, ORB, and Color Histogram; Custom Designed CNN and established DL architecture like AlexNet; transfer learning on five models pre-trained using ImageNet.
Xception outperformed all of them in terms of generalization, achieving 98% accuracy on the test data, with a model size of 80.03 MB and a prediction time of 0.0633 seconds.
arXiv Detail & Related papers (2024-08-22T14:20:34Z) - Detecting AI Generated Text Based on NLP and Machine Learning Approaches [0.0]
Recent advances in natural language processing may enable AI models to generate writing that is identical to human written form in the future.
This might have profound ethical, legal, and social repercussions.
Our approach includes a machine learning methods that can differentiate between electronically produced text and human-written text.
arXiv Detail & Related papers (2024-04-15T16:37:44Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Generative AI for Software Metadata: Overview of the Information
Retrieval in Software Engineering Track at FIRE 2023 [18.616716369775883]
The Information Retrieval in Software Engineering (IRSE) track aims to develop solutions for automated evaluation of code comments.
The dataset consists of 9048 code comments and surrounding code snippet pairs extracted from open source C based projects.
The labels generated from large language models increase the bias in the prediction model but lead to less over-fitted results.
arXiv Detail & Related papers (2023-10-27T14:13:23Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - Data-Driven and SE-assisted AI Model Signal-Awareness Enhancement and
Introspection [61.571331422347875]
We propose a data-driven approach to enhance models' signal-awareness.
We combine the SE concept of code complexity with the AI technique of curriculum learning.
We achieve up to 4.8x improvement in model signal awareness.
arXiv Detail & Related papers (2021-11-10T17:58:18Z) - A survey on Variational Autoencoders from a GreenAI perspective [0.0]
Variational AutoEncoders (VAEs) are powerful generative models that merge elements from statistics and information theory with the flexibility offered by deep neural networks.
This article provides a comparative evaluation of some of the most successful, recent variations of VAEs.
arXiv Detail & Related papers (2021-03-01T15:26:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.