Generation of Optimized Solidity Code for Machine Learning Models using LLMs
- URL: http://arxiv.org/abs/2503.06203v1
- Date: Sat, 08 Mar 2025 13:12:52 GMT
- Title: Generation of Optimized Solidity Code for Machine Learning Models using LLMs
- Authors: Nikumbh Sarthak Sham, Sandip Chakraborty, Shamik Sural,
- Abstract summary: We propose a novel approach that enables conversion of the inferencing path of an ML model as well as its weights trained off-chain into Solidity code using Large Language Models (LLMs)<n>We have also developed a proof of concept decentralized application using the code so generated for verifying the accuracy claims of the underlying ML model.
- Score: 5.07666452437053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While a plethora of machine learning (ML) models are currently available, along with their implementation on disparate platforms, there is hardly any verifiable ML code which can be executed on public blockchains. We propose a novel approach named LMST that enables conversion of the inferencing path of an ML model as well as its weights trained off-chain into Solidity code using Large Language Models (LLMs). Extensive prompt engineering is done to achieve gas cost optimization beyond mere correctness of the produced code, while taking into consideration the capabilities and limitations of the Ethereum Virtual Machine. We have also developed a proof of concept decentralized application using the code so generated for verifying the accuracy claims of the underlying ML model. An extensive set of experiments demonstrate the feasibility of deploying ML models on blockchains through automated code translation using LLMs.
Related papers
- Quantizing Large Language Models for Code Generation: A Differentiated Replication [51.85505914274633]
Large Language Models (LLMs) have shown an impressive capability in code generation and, specifically, to automatically implement requirements described in natural language.
LLMs pose significant challenges related to their memory (and, consequently, carbon) footprint.
New frontier for LLM quantization is 4-bit precision, resulting in an average memory footprint reduction of 70%.
arXiv Detail & Related papers (2025-03-10T09:26:08Z) - Large Language Diffusion Models [77.02553707673418]
Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs)
We introduce LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning paradigm.
Across extensive benchmarks, LLaDA demonstrates strong scalability, outperforming our self-constructed ARM baselines.
arXiv Detail & Related papers (2025-02-14T08:23:51Z) - SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding [66.74446220401296]
We propose SynerGen-VL, a simple yet powerful encoder-free MLLM capable of both image understanding and generation.<n>We introduce the token folding mechanism and the vision-expert-based progressive alignment pretraining strategy, which effectively support high-resolution image understanding.<n>Our code and models shall be released.
arXiv Detail & Related papers (2024-12-12T18:59:26Z) - zsLLMCode: An Effective Approach for Code Embedding via LLM with Zero-Shot Learning [6.976968804436321]
This paper proposes a novel zero-shot approach, zsLLMCode, to generate code embeddings by using large language models (LLMs) and sentence embedding models.<n>The results have demonstrated the effectiveness and superiority of our method over state-of-the-art unsupervised approaches.
arXiv Detail & Related papers (2024-09-23T01:03:15Z) - CubicML: Automated ML for Large ML Systems Co-design with ML Prediction of Performance [7.425372356516303]
Scaling up deep learning models has been proven effective to improve intelligence of machine learning (ML) models.
In this paper, we propose CubicML which uses ML to automatically optimize training performance of large distributed ML systems.
We prove that CubicML can effectively optimize training speed of in-house recommendation models with 73 billion parameters and large language models up to 405 billion parameters at Meta ads.
arXiv Detail & Related papers (2024-09-06T19:55:21Z) - Verbalized Machine Learning: Revisiting Machine Learning with Language Models [63.10391314749408]
We introduce the framework of verbalized machine learning (VML)<n>VML constrains the parameter space to be human-interpretable natural language.<n>We empirically verify the effectiveness of VML, and hope that VML can serve as a stepping stone to stronger interpretability.
arXiv Detail & Related papers (2024-06-06T17:59:56Z) - LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit [55.73370804397226]
Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating large language models.
We present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization.
Powered by this versatile toolkit, our benchmark covers three key aspects: calibration data, algorithms (three strategies), and data formats.
arXiv Detail & Related papers (2024-05-09T11:49:05Z) - ML2SC: Deploying Machine Learning Models as Smart Contracts on the Blockchain [1.433758865948252]
We introduce Machine Learning to Contract (ML2SC), a PyTorch to Solidity translator that can translate multi-layer perceptron (MLP) models written in Pytorch to Solidity smart contract versions.
After deploying the generated smart contract, we can train our models off-chain using PyTorch and then further transfer the acquired weights and biases to the smart contract using a function call.
arXiv Detail & Related papers (2024-03-28T23:55:10Z) - opML: Optimistic Machine Learning on Blockchain [0.0]
We introduce opML (Optimistic Machine Learning on chain), an innovative approach that empowers blockchain systems to conduct AI model inference.
opML lies a interactive fraud proof protocol, reminiscent of the optimistic rollup systems.
opML offers cost-efficient and highly efficient ML services, with minimal participation requirements.
arXiv Detail & Related papers (2024-01-31T02:43:38Z) - Knowledge Fusion of Large Language Models [73.28202188100646]
This paper introduces the notion of knowledge fusion for large language models (LLMs)
We externalize their collective knowledge and unique strengths, thereby elevating the capabilities of the target model beyond those of any individual source LLM.
Our findings confirm that the fusion of LLMs can improve the performance of the target model across a range of capabilities such as reasoning, commonsense, and code generation.
arXiv Detail & Related papers (2024-01-19T05:02:46Z) - If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code)
Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - RedCoast: A Lightweight Tool to Automate Distributed Training of LLMs on Any GPU/TPUs [32.01139974519813]
We present RedCoast, a tool crafted to automate distributed training and inference for large language models (LLMs)
We also propose a mechanism that allows for the customization of diverse ML pipelines through the definition of merely three functions.
As a result, Redco implementations exhibit significantly fewer lines of code compared to their official counterparts.
arXiv Detail & Related papers (2023-10-25T04:32:35Z) - Enabling Un-/Semi-Supervised Machine Learning for MDSE of the Real-World
CPS/IoT Applications [0.5156484100374059]
We propose a novel approach to support domain-specific Model-Driven Software Engineering (MDSE) for the real-world use-case scenarios of smart Cyber-Physical Systems (CPS) and the Internet of Things (IoT)
We argue that the majority of available data in the nature for Artificial Intelligence (AI) are unlabeled. Hence, unsupervised and/or semi-supervised ML approaches are the practical choices.
Our proposed approach is fully implemented and integrated with an existing state-of-the-art MDSE tool to serve the CPS/IoT domain.
arXiv Detail & Related papers (2021-07-06T15:51:39Z) - MLGO: a Machine Learning Guided Compiler Optimizations Framework [0.0]
This work is the first full integration of machine learning in a complex compiler pass in a real-world setting.
We use two different ML algorithms to train the inlining-for-size model, and achieve up to 7% size reduction.
The same model generalizes well to a diversity of real-world targets, as well as to the same set of targets after months of active development.
arXiv Detail & Related papers (2021-01-13T00:02:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.