Related papers: ML2SC: Deploying Machine Learning Models as Smart Contracts on the Blockchain

ML2SC: Deploying Machine Learning Models as Smart Contracts on the Blockchain

URL: http://arxiv.org/abs/2404.16967v1
Date: Thu, 28 Mar 2024 23:55:10 GMT
Title: ML2SC: Deploying Machine Learning Models as Smart Contracts on the Blockchain
Authors: Zhikai Li, Steve Vott, Bhaskar Krishnamachar,
Abstract summary: We introduce Machine Learning to Contract (ML2SC), a PyTorch to Solidity translator that can translate multi-layer perceptron (MLP) models written in Pytorch to Solidity smart contract versions. After deploying the generated smart contract, we can train our models off-chain using PyTorch and then further transfer the acquired weights and biases to the smart contract using a function call.
Score: 1.433758865948252
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the growing concern of AI safety, there is a need to trust the computations done by machine learning (ML) models. Blockchain technology, known for recording data and running computations transparently and in a tamper-proof manner, can offer this trust. One significant challenge in deploying ML Classifiers on-chain is that while ML models are typically written in Python using an ML library such as Pytorch, smart contracts deployed on EVM-compatible blockchains are written in Solidity. We introduce Machine Learning to Smart Contract (ML2SC), a PyTorch to Solidity translator that can automatically translate multi-layer perceptron (MLP) models written in Pytorch to Solidity smart contract versions. ML2SC uses a fixed-point math library to approximate floating-point computation. After deploying the generated smart contract, we can train our models off-chain using PyTorch and then further transfer the acquired weights and biases to the smart contract using a function call. Finally, the model inference can also be done with a function call providing the input. We mathematically model the gas costs associated with deploying, updating model parameters, and running inference on these models on-chain, showing that the gas costs increase linearly in various parameters associated with an MLP. We present empirical results matching our modeling. We also evaluate the classification accuracy showing that the outputs obtained by our transparent on-chain implementation are identical to the original off-chain implementation with Pytorch.

Related papers

Generation of Optimized Solidity Code for Machine Learning Models using LLMs [5.07666452437053]
We propose a novel approach that enables conversion of the inferencing path of an ML model as well as its weights trained off-chain into Solidity code using Large Language Models (LLMs) We have also developed a proof of concept decentralized application using the code so generated for verifying the accuracy claims of the underlying ML model.
arXiv Detail & Related papers (2025-03-08T13:12:52Z)
Large Language Diffusion Models [77.02553707673418]
Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs) We introduce LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning paradigm. Across extensive benchmarks, LLaDA demonstrates strong scalability, outperforming our self-constructed ARM baselines.
arXiv Detail & Related papers (2025-02-14T08:23:51Z)
Leveraging Large Language Models and Machine Learning for Smart Contract Vulnerability Detection [0.0]
We train and test machine learning algorithms to classify smart contract codes according to type in order to compare model performance. Our research combines machine learning and large language models to provide a rich and interpretable framework for detecting different smart contract vulnerabilities.
arXiv Detail & Related papers (2025-01-04T08:32:53Z)
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs) We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z)
Patch-Level Training for Large Language Models [69.67438563485887]
This paper introduces patch-level training for Large Language Models (LLMs) During patch-level training, we feed the language model shorter sequences of patches and train it to predict the next patch. Following this, the model continues token-level training on the remaining training data to align with the inference mode.
arXiv Detail & Related papers (2024-07-17T15:48:39Z)
In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL) We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z)
MatFormer: Nested Transformer for Elastic Inference [91.45687988953435]
MatFormer is a novel Transformer architecture designed to provide elastic inference across diverse deployment constraints. MatFormer achieves this by incorporating a nested Feed Forward Network (FFN) block structure within a standard Transformer model. We show that a 850M decoder-only MatFormer language model (MatLM) allows us to extract multiple smaller models spanning from 582M to 850M parameters.
arXiv Detail & Related papers (2023-10-11T17:57:14Z)
A Model-Based Machine Learning Approach for Assessing the Performance of Blockchain Applications [0.0]
We use machine learning (ML) model-based methods to predict blockchain performance. We employ the salp swarm optimization (SO) ML model which enables the investigation of optimal blockchain configurations. The $k$NN model outperforms SVM by 5% and the ISO also demonstrates a reduction of 4% inaccuracy deviation compared to regular SO.
arXiv Detail & Related papers (2023-09-20T10:39:21Z)
CLAWSAT: Towards Both Robust and Accurate Code Models [74.57590254102311]
We integrate contrastive learning (CL) with adversarial learning to co-optimize the robustness and accuracy of code models. To the best of our knowledge, this is the first systematic study to explore and exploit the robustness and accuracy benefits of (multi-view) code obfuscations in code models.
arXiv Detail & Related papers (2022-11-21T18:32:50Z)
PyHHMM: A Python Library for Heterogeneous Hidden Markov Models [63.01207205641885]
PyHHMM is an object-oriented Python implementation of Heterogeneous-Hidden Markov Models (HHMMs) PyHHMM emphasizes features not supported in similar available frameworks: a heterogeneous observation model, missing data inference, different model order selection criterias, and semi-supervised training. PyHHMM relies on the numpy, scipy, scikit-learn, and seaborn Python packages, and is distributed under the Apache-2.0 License.
arXiv Detail & Related papers (2022-01-12T07:32:36Z)
Tribuo: Machine Learning with Provenance in Java [0.0]
We introduce Tribuo, a Java ML library that integrates training, type-safety, runtime checking, and automatic recording into a single framework. All Tribuo's models and evaluations record the full processing pipeline for input data, along with the training algorithms.
arXiv Detail & Related papers (2021-10-06T19:10:50Z)
COMBO: State-of-the-Art Morphosyntactic Analysis [0.0]
COMBO is a fully neural NLP system for accurate part-of-speech tagging, morphological analysis, lemmatisation, and (enhanced) dependency parsing. It predicts categorical morphosyntactic features whilst also exposing their vector representations, extracted from hidden layers. It is an easy to install Python package with automatically downloadable pre-trained models for over 40 languages.
arXiv Detail & Related papers (2021-09-11T20:00:20Z)
OmniLytics: A Blockchain-based Secure Data Market for Decentralized Machine Learning [3.9256804549871553]
We propose OmniLytics, a secure data trading marketplace for machine learning applications. Data owners can contribute their private data to collectively train a ML model requested by some model owners, and get compensated for data contribution. OmniLytics enables such model training while simultaneously providing 1) model security against curious data owners; 2) data security against curious model and data owners; 3) resilience to malicious data owners who provide faulty results to poison model training; and 4) resilience to malicious model owner who intents to evade the payment.
arXiv Detail & Related papers (2021-07-12T08:28:15Z)
A Bytecode-based Approach for Smart Contract Classification [10.483992071557195]
The number of smart contracts deployed on blockchain platforms is growing exponentially, which makes it difficult for users to find desired services by manual screening. Current research on smart contract classification focuses on Natural Language Processing (NLP) solutions which are based on contract source code. This paper proposes a classification model based on features from contract bytecode instead of source code to solve these problems.
arXiv Detail & Related papers (2021-05-31T03:00:29Z)
LAVA NAT: A Non-Autoregressive Translation Model with Look-Around Decoding and Vocabulary Attention [54.18121922040521]
Non-autoregressive translation (NAT) models generate multiple tokens in one forward pass. These NAT models often suffer from the multimodality problem, generating duplicated tokens or missing tokens. We propose two novel methods to address this issue, the Look-Around (LA) strategy and the Vocabulary Attention (VA) mechanism.
arXiv Detail & Related papers (2020-02-08T04:11:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.