Vamsa: Automated Provenance Tracking in Data Science Scripts
- URL: http://arxiv.org/abs/2001.01861v2
- Date: Thu, 30 Jul 2020 16:58:22 GMT
- Title: Vamsa: Automated Provenance Tracking in Data Science Scripts
- Authors: Mohammad Hossein Namaki, Avrilia Floratou, Fotis Psallidas, Subru
Krishnan, Ashvin Agrawal, Yinghui Wu, Yiwen Zhu and Markus Weimer
- Abstract summary: We introduce the ML provenance tracking problem.
We discuss the challenges in capturing such information in the context of Python.
We present Vamsa, a modular system that extracts provenance from Python scripts without requiring any changes to the users' code.
- Score: 17.53546311589593
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: There has recently been a lot of ongoing research in the areas of fairness,
bias and explainability of machine learning (ML) models due to the self-evident
or regulatory requirements of various ML applications. We make the following
observation: All of these approaches require a robust understanding of the
relationship between ML models and the data used to train them. In this work,
we introduce the ML provenance tracking problem: the fundamental idea is to
automatically track which columns in a dataset have been used to derive the
features/labels of an ML model. We discuss the challenges in capturing such
information in the context of Python, the most common language used by data
scientists. We then present Vamsa, a modular system that extracts provenance
from Python scripts without requiring any changes to the users' code. Using 26K
real data science scripts, we verify the effectiveness of Vamsa in terms of
coverage, and performance. We also evaluate Vamsa's accuracy on a smaller
subset of manually labeled data. Our analysis shows that Vamsa's precision and
recall range from 90.4% to 99.1% and its latency is in the order of
milliseconds for average size scripts. Drawing from our experience in deploying
ML models in production, we also present an example in which Vamsa helps
automatically identify models that are affected by data corruption issues.
Related papers
- Advancing Anomaly Detection: Non-Semantic Financial Data Encoding with LLMs [49.57641083688934]
We introduce a novel approach to anomaly detection in financial data using Large Language Models (LLMs) embeddings.
Our experiments demonstrate that LLMs contribute valuable information to anomaly detection as our models outperform the baselines.
arXiv Detail & Related papers (2024-06-05T20:19:09Z) - Are you still on track!? Catching LLM Task Drift with Activations [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.
We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.
We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - Accelerated Cloud for Artificial Intelligence (ACAI) [24.40451195277244]
We propose an end-to-end cloud-based machine learning platform, Accelerated Cloud for AI (ACAI)
ACAI enables cloud-based storage of indexed, labeled, and searchable data, as well as automatic resource provisioning, job scheduling, and experiment tracking.
We show that our auto-provisioner produces a 1.7x speed-up and 39% cost reduction, and our system reduces experiment time for ML scientists by 20% on typical ML use cases.
arXiv Detail & Related papers (2024-01-30T07:09:48Z) - Towards Automatic Translation of Machine Learning Visual Insights to
Analytical Assertions [23.535630175567146]
We present our vision for developing an automated tool capable of translating visual properties observed in Machine Learning (ML) visualisations into Python assertions.
The tool aims to streamline the process of manually verifying these visualisations in the ML development cycle, which is critical as real-world data and assumptions often change post-deployment.
arXiv Detail & Related papers (2024-01-15T14:11:59Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - MLLM-DataEngine: An Iterative Refinement Approach for MLLM [62.30753425449056]
We propose a novel closed-loop system that bridges data generation, model training, and evaluation.
Within each loop, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results.
For targeting, we propose an Adaptive Bad-case Sampling module, which adjusts the ratio of different types of data.
For quality, we resort to GPT-4 to generate high-quality data with each given data type.
arXiv Detail & Related papers (2023-08-25T01:41:04Z) - The Devil is in the Errors: Leveraging Large Language Models for
Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv Detail & Related papers (2023-08-14T17:17:21Z) - AI Model Disgorgement: Methods and Choices [127.54319351058167]
We introduce a taxonomy of possible disgorgement methods that are applicable to modern machine learning systems.
We investigate the meaning of "removing the effects" of data in the trained model in a way that does not require retraining from scratch.
arXiv Detail & Related papers (2023-04-07T08:50:18Z) - Tribuo: Machine Learning with Provenance in Java [0.0]
We introduce Tribuo, a Java ML library that integrates training, type-safety, runtime checking, and automatic recording into a single framework.
All Tribuo's models and evaluations record the full processing pipeline for input data, along with the training algorithms.
arXiv Detail & Related papers (2021-10-06T19:10:50Z) - FreaAI: Automated extraction of data slices to test machine learning
models [2.475112368179548]
We show the feasibility of automatically extracting feature models that result in explainable data slices over which the ML solution under-performs.
Our novel technique, IBM FreaAI aka FreaAI, extracts such slices from structured ML test data or any other labeled data.
arXiv Detail & Related papers (2021-08-12T09:21:16Z) - Did the Model Change? Efficiently Assessing Machine Learning API Shifts [24.342984907651505]
Machine learning (ML) prediction APIs are increasingly widely used.
They can change over time due to model updates or retraining.
It is often not clear to the user if and how the ML model has changed.
arXiv Detail & Related papers (2021-07-29T17:41:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.