DetAIL : A Tool to Automatically Detect and Analyze Drift In Language
- URL: http://arxiv.org/abs/2211.04250v1
- Date: Thu, 3 Nov 2022 19:50:12 GMT
- Title: DetAIL : A Tool to Automatically Detect and Analyze Drift In Language
- Authors: Nishtha Madaan, Adithya Manjunatha, Hrithik Nambiar, Aviral Kumar
Goel, Harivansh Kumar, Diptikalyan Saha, Srikanta Bedathur
- Abstract summary: This work aims to ensure that machine learning and deep learning-based systems are as trusted as traditional software.
Current systems rely on scheduled re-training of these models as new data kicks in.
We propose to measure the data drift that takes place when new data kicks in so that one can adaptively re-train the models whenever re-training is actually required.
- Score: 8.968228078113189
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning and deep learning-based decision making has become part of
today's software. The goal of this work is to ensure that machine learning and
deep learning-based systems are as trusted as traditional software. Traditional
software is made dependable by following rigorous practice like static
analysis, testing, debugging, verifying, and repairing throughout the
development and maintenance life-cycle. Similarly for machine learning systems,
we need to keep these models up to date so that their performance is not
compromised. For this, current systems rely on scheduled re-training of these
models as new data kicks in. In this work, we propose to measure the data drift
that takes place when new data kicks in so that one can adaptively re-train the
models whenever re-training is actually required irrespective of schedules. In
addition to that, we generate various explanations at sentence level and
dataset level to capture why a given payload text has drifted.
Related papers
- RESTOR: Knowledge Recovery through Machine Unlearning [71.75834077528305]
Large language models trained on web-scale corpora can memorize undesirable datapoints.
Many machine unlearning methods have been proposed that aim to 'erase' these datapoints from trained models.
We propose the RESTOR framework for machine unlearning based on the following dimensions.
arXiv Detail & Related papers (2024-10-31T20:54:35Z) - Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - How to unlearn a learned Machine Learning model ? [0.0]
I will present an elegant algorithm for unlearning a machine learning model and visualize its abilities.
I will elucidate the underlying mathematical theory and establish specific metrics to evaluate both the unlearned model's performance on desired data and its level of ignorance regarding unwanted data.
arXiv Detail & Related papers (2024-10-13T17:38:09Z) - Robust Machine Learning by Transforming and Augmenting Imperfect
Training Data [6.928276018602774]
This thesis explores several data sensitivities of modern machine learning.
We first discuss how to prevent ML from codifying prior human discrimination measured in the training data.
We then discuss the problem of learning from data containing spurious features, which provide predictive fidelity during training but are unreliable upon deployment.
arXiv Detail & Related papers (2023-12-19T20:49:28Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - On the Costs and Benefits of Adopting Lifelong Learning for Software
Analytics -- Empirical Study on Brown Build and Risk Prediction [17.502553991799832]
This paper evaluates the use of lifelong learning (LL) for industrial use cases at Ubisoft.
LL is used to continuously build and maintain ML-based software analytics tools using an incremental learner that progressively updates the old model using new data.
arXiv Detail & Related papers (2023-05-16T21:57:16Z) - Continual-Learning-as-a-Service (CLaaS): On-Demand Efficient Adaptation
of Predictive Models [17.83007940710455]
Two main future trends for companies that want to build machine learning-based applications are real-time inference and continual updating.
This paper defines a novel software service and model delivery infrastructure termed Continual Learning-as-a-Service (CL) to address these issues.
It provides support for model updating and validation tools for data scientists without an on-premise solution and in an efficient, stateful and easy-to-use manner.
arXiv Detail & Related papers (2022-06-14T16:22:54Z) - Can Bad Teaching Induce Forgetting? Unlearning in Deep Networks using an
Incompetent Teacher [6.884272840652062]
We propose a novel machine unlearning method by exploring the utility of competent and incompetent teachers in a student-teacher framework to induce forgetfulness.
The knowledge from the competent and incompetent teachers is selectively transferred to the student to obtain a model that doesn't contain any information about the forget data.
We introduce the zero forgetting (ZRF) metric to evaluate any unlearning method.
arXiv Detail & Related papers (2022-05-17T05:13:17Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.