LLM4PLC: Harnessing Large Language Models for Verifiable Programming of
PLCs in Industrial Control Systems
- URL: http://arxiv.org/abs/2401.05443v1
- Date: Mon, 8 Jan 2024 23:52:42 GMT
- Title: LLM4PLC: Harnessing Large Language Models for Verifiable Programming of
PLCs in Industrial Control Systems
- Authors: Mohamad Fakih, Rahul Dharmaji, Yasamin Moghaddas, Gustavo Quiros
Araya, Oluwatosin Ogundare, and Mohammad Abdullah Al Faruque
- Abstract summary: Large Language Models (LLMs) fail to produce valid programs for Industrial Control Systems (ICS) operated by Programmable Logic Controllers (PLCs)
We propose a user-guided iterative pipeline leveraging user feedback and external verification tools including grammar checkers, compilers and SMV verifiers.
We run a complete test suite on GPT-3.5, GPT-4, Code Llama-7B, a fine-tuned Code Llama-7B model, Code Llama-34B, and a fine-tuned Code Llama-34B model.
- Score: 9.946058168276744
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Although Large Language Models (LLMs) have established pre-dominance in
automated code generation, they are not devoid of shortcomings. The pertinent
issues primarily relate to the absence of execution guarantees for generated
code, a lack of explainability, and suboptimal support for essential but niche
programming languages. State-of-the-art LLMs such as GPT-4 and LLaMa2 fail to
produce valid programs for Industrial Control Systems (ICS) operated by
Programmable Logic Controllers (PLCs). We propose LLM4PLC, a user-guided
iterative pipeline leveraging user feedback and external verification tools
including grammar checkers, compilers and SMV verifiers to guide the LLM's
generation. We further enhance the generation potential of LLM by employing
Prompt Engineering and model fine-tuning through the creation and usage of
LoRAs. We validate this system using a FischerTechnik Manufacturing TestBed
(MFTB), illustrating how LLMs can evolve from generating structurally flawed
code to producing verifiably correct programs for industrial applications. We
run a complete test suite on GPT-3.5, GPT-4, Code Llama-7B, a fine-tuned Code
Llama-7B model, Code Llama-34B, and a fine-tuned Code Llama-34B model. The
proposed pipeline improved the generation success rate from 47% to 72%, and the
Survey-of-Experts code quality from 2.25/10 to 7.75/10. To promote open
research, we share the complete experimental setup, the LLM Fine-Tuning
Weights, and the video demonstrations of the different programs on our
dedicated webpage.
Related papers
- OriGen:Enhancing RTL Code Generation with Code-to-Code Augmentation and Self-Reflection [54.775409528658486]
OriGen is a fully open-source framework featuring self-reflection capabilities and a dataset augmentation methodology.
We show that OriGen remarkably outperforms other open-source alternatives in RTL code generation.
arXiv Detail & Related papers (2024-07-23T07:22:25Z) - Towards Large Language Model Aided Program Refinement [10.089955747110444]
Program refinement involves correctness-preserving transformations from formal high-level specification statements into executable programs.
Large language models (LLMs) enable automatic code generations from informal natural language specifications.
We propose LLM4PR, a tool that combines formal program refinement techniques with informal LLM-based methods.
arXiv Detail & Related papers (2024-06-26T04:29:27Z) - Validating LLM-Generated Programs with Metamorphic Prompt Testing [8.785973653167112]
Large Language Models (LLMs) are increasingly integrated into the software development lifecycle.
This paper proposes a novel solution called metamorphic prompt testing to address these challenges.
Our evaluation on HumanEval shows that metamorphic prompt testing is able to detect 75 percent of the erroneous programs generated by GPT-4, with a false positive rate of 8.6 percent.
arXiv Detail & Related papers (2024-06-11T00:40:17Z) - When LLM-based Code Generation Meets the Software Development Process [50.82665351100067]
This paper introduces LCG, a code generation framework inspired by established software engineering practices.
LLM agents emulate various software process models, namely LCGWaterfall, LCGTDD, and LCGScrum.
We evaluate LCG across four code generation benchmarks: HumanEval, HumanEval-ET, MBPP, and MBPP-ET.
arXiv Detail & Related papers (2024-03-23T14:04:48Z) - InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models [56.723509505549536]
InfiBench is the first large-scale freeform question-answering (QA) benchmark for code to our knowledge.
It comprises 234 carefully selected high-quality Stack Overflow questions that span across 15 programming languages.
We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings.
arXiv Detail & Related papers (2024-03-11T02:06:30Z) - StepCoder: Improve Code Generation with Reinforcement Learning from
Compiler Feedback [58.20547418182074]
We introduce StepCoder, a novel framework for code generation, consisting of two main components.
CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks.
FGO only optimize the model by masking the unexecuted code segments to provide Fine-Grained Optimization.
Our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks.
arXiv Detail & Related papers (2024-02-02T13:14:31Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - Large Language Models for Test-Free Fault Localization [11.080712737595174]
We propose a language model based fault localization approach that locates buggy lines of code without any test coverage information.
We fine-tune language models with 350 million, 6 billion, and 16 billion parameters on small, manually curated corpora of buggy programs.
Our empirical evaluation shows that LLMAO improves the Top-1 results over the state-of-the-art machine learning fault localization (MLFL) baselines by 2.3%-54.4%, and Top-5 results by 14.4%-35.6%.
arXiv Detail & Related papers (2023-10-03T01:26:39Z) - CodeApex: A Bilingual Programming Evaluation Benchmark for Large
Language Models [43.655927559990616]
We propose CodeApex, a benchmark dataset focusing on the programming comprehension, code generation, and code correction abilities of LLMs.
We evaluate 12 widely used LLMs, including both general-purpose and specialized models.
GPT-4 exhibits the best programming capabilities, achieving approximate accuracy of 69%, 54%, and 66% on the three tasks, respectively.
arXiv Detail & Related papers (2023-09-05T04:12:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.