Machine Learning Models Have a Supply Chain Problem
- URL: http://arxiv.org/abs/2505.22778v1
- Date: Wed, 28 May 2025 18:47:14 GMT
- Title: Machine Learning Models Have a Supply Chain Problem
- Authors: Sarah Meiklejohn, Hayden Blauzvern, Mihai Maruseac, Spencer Schrock, Laurent Simon, Ilia Shumailov,
- Abstract summary: We argue that the current ecosystem for open ML models contains significant supply-chain risks.<n>These include an attacker replacing a model with something malicious.<n>We then explore how Sigstore can be used to bring transparency to open ML models.
- Score: 12.386549415284259
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Powerful machine learning (ML) models are now readily available online, which creates exciting possibilities for users who lack the deep technical expertise or substantial computing resources needed to develop them. On the other hand, this type of open ecosystem comes with many risks. In this paper, we argue that the current ecosystem for open ML models contains significant supply-chain risks, some of which have been exploited already in real attacks. These include an attacker replacing a model with something malicious (e.g., malware), or a model being trained using a vulnerable version of a framework or on restricted or poisoned data. We then explore how Sigstore, a solution designed to bring transparency to open-source software supply chains, can be used to bring transparency to open ML models, in terms of enabling model publishers to sign their models and prove properties about the datasets they use.
Related papers
- Mitigating Downstream Model Risks via Model Provenance [28.390839690707256]
We propose a machine-readable model specification format to simplify the creation of model records.<n>Our solution explicitly traces relationships between upstream and downstream models, enhancing transparency and traceability.<n>This proof of concept aims to set a new standard for managing foundation models, bridging the gap between innovation and responsible model management.
arXiv Detail & Related papers (2024-10-03T05:52:15Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Privacy and Security Implications of Cloud-Based AI Services : A Survey [4.1964397179107085]
This paper details the privacy and security landscape in today's cloud ecosystem.
It identifies that there is a gap in addressing the risks introduced by machine learning models.
arXiv Detail & Related papers (2024-01-31T13:30:20Z) - Towards Scalable and Robust Model Versioning [30.249607205048125]
Malicious incursions aimed at gaining access to deep learning models are on the rise.
We show how to generate multiple versions of a model that possess different attack properties.
We show theoretically that this can be accomplished by incorporating parameterized hidden distributions into the model training data.
arXiv Detail & Related papers (2024-01-17T19:55:49Z) - Learn to Unlearn: A Survey on Machine Unlearning [29.077334665555316]
This article presents a review of recent machine unlearning techniques, verification mechanisms, and potential attacks.
We highlight emerging challenges and prospective research directions.
We aim for this paper to provide valuable resources for integrating privacy, equity, andresilience into ML systems.
arXiv Detail & Related papers (2023-05-12T14:28:02Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z) - Self-Destructing Models: Increasing the Costs of Harmful Dual Uses of
Foundation Models [103.71308117592963]
We present an algorithm for training self-destructing models leveraging techniques from meta-learning and adversarial learning.
In a small-scale experiment, we show MLAC can largely prevent a BERT-style model from being re-purposed to perform gender identification.
arXiv Detail & Related papers (2022-11-27T21:43:45Z) - Learnware: Small Models Do Big [69.88234743773113]
The prevailing big model paradigm, which has achieved impressive results in natural language processing and computer vision applications, has not yet addressed those issues, whereas becoming a serious source of carbon emissions.
This article offers an overview of the learnware paradigm, which attempts to enable users not need to build machine learning models from scratch, with the hope of reusing small models to do things even beyond their original purposes.
arXiv Detail & Related papers (2022-10-07T15:55:52Z) - A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems.
ML models often remember' the old data.
Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z) - MOVE: Effective and Harmless Ownership Verification via Embedded External Features [104.97541464349581]
We propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously.<n>We conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features.<n>We then train a meta-classifier to determine whether a model is stolen from the victim.
arXiv Detail & Related papers (2022-08-04T02:22:29Z) - Survey: Leakage and Privacy at Inference Time [59.957056214792665]
Leakage of data from publicly available Machine Learning (ML) models is an area of growing significance.
We focus on inference-time leakage, as the most likely scenario for publicly available models.
We propose a taxonomy across involuntary and malevolent leakage, available defences, followed by the currently available assessment metrics and applications.
arXiv Detail & Related papers (2021-07-04T12:59:16Z) - Green Lighting ML: Confidentiality, Integrity, and Availability of
Machine Learning Systems in Deployment [4.2317391919680425]
In production machine learning, there is generally a hand-off from those who build a model to those who deploy a model.
In this hand-off, the engineers responsible for model deployment are often not privy to the details of the model.
In order to help alleviate this issue, automated systems for validating privacy and security of models need to be developed.
arXiv Detail & Related papers (2020-07-09T10:38:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.