Rethinking Streaming Machine Learning Evaluation
        - URL: http://arxiv.org/abs/2205.11473v1
 - Date: Mon, 23 May 2022 17:21:43 GMT
 - Title: Rethinking Streaming Machine Learning Evaluation
 - Authors: Shreya Shankar, Bernease Herman, Aditya G. Parameswaran
 - Abstract summary: We discuss how the nature of streaming ML problems introduces new real-world challenges (e.g., delayed arrival of labels) and recommend additional metrics to assess streaming ML performance.
 - Score: 9.69979862225396
 - License: http://creativecommons.org/licenses/by/4.0/
 - Abstract:   While most work on evaluating machine learning (ML) models focuses on
computing accuracy on batches of data, tracking accuracy alone in a streaming
setting (i.e., unbounded, timestamp-ordered datasets) fails to appropriately
identify when models are performing unexpectedly. In this position paper, we
discuss how the nature of streaming ML problems introduces new real-world
challenges (e.g., delayed arrival of labels) and recommend additional metrics
to assess streaming ML performance.
 
       
      
        Related papers
        - Protecting multimodal large language models against misleading   visualizations [94.71976205962527]
We introduce the first inference-time methods to improve performance on misleading visualizations.
We find that MLLM question-answering accuracy drops on average to the level of a random baseline.
arXiv  Detail & Related papers  (2025-02-27T20:22:34Z) - Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv  Detail & Related papers  (2024-10-30T17:20:10Z) - A Systematic Review of Machine Learning Approaches for Detecting   Deceptive Activities on Social Media: Methods, Challenges, and Biases [0.037693031068634524]
This systematic review evaluates studies that apply machine learning (ML) and deep learning (DL) models to detect fake news, spam, and fake accounts on social media.
arXiv  Detail & Related papers  (2024-10-26T23:55:50Z) - Don't Push the Button! Exploring Data Leakage Risks in Machine Learning   and Transfer Learning [0.0]
This paper addresses a critical issue in Machine Learning (ML) where unintended information contaminates the training data, impacting model performance evaluation.
The discrepancy between evaluated and actual performance on new data is a significant concern.
It explores the connection between data leakage and the specific task being addressed, investigates its occurrence in Transfer Learning, and compares standard inductive ML with transductive ML frameworks.
arXiv  Detail & Related papers  (2024-01-24T20:30:52Z) - The Devil is in the Errors: Leveraging Large Language Models for
  Fine-grained Machine Translation Evaluation [93.01964988474755]
AutoMQM is a prompting technique which asks large language models to identify and categorize errors in translations.
We study the impact of labeled data through in-context learning and finetuning.
We then evaluate AutoMQM with PaLM-2 models, and we find that it improves performance compared to just prompting for scores.
arXiv  Detail & Related papers  (2023-08-14T17:17:21Z) - Mitigating ML Model Decay in Continuous Integration with Data Drift
  Detection: An Empirical Study [7.394099294390271]
This study aims to investigate the performance of using data drift detection techniques for automatically detecting the retraining points for ML models for TCP in CI environments.
We employed the Hellinger distance to identify changes in both the values and distribution of input data and leveraged these changes as retraining points for the ML model.
Our experimental evaluation of the Hellinger distance-based method demonstrated its efficacy and efficiency in detecting retraining points and reducing the associated costs.
arXiv  Detail & Related papers  (2023-05-22T05:55:23Z) - A hybrid feature learning approach based on convolutional kernels for
  ATM fault prediction using event-log data [5.859431341476405]
We present a predictive model based on a convolutional kernel (MiniROCKET and HYDRA) to extract features from event-log data.
The proposed methodology is applied to a significant real-world collected dataset.
The model was integrated into a container-based decision support system to support operators in the timely maintenance of ATMs.
arXiv  Detail & Related papers  (2023-05-17T08:55:53Z) - Discover, Explanation, Improvement: An Automatic Slice Detection
  Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints.
This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks.
Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv  Detail & Related papers  (2022-11-08T19:00:00Z) - AI Total: Analyzing Security ML Models with Imperfect Data in Production [2.629585075202626]
Development of new machine learning models is typically done on manually curated data sets.
We develop a web-based visualization system that allows the users to quickly gather headline performance numbers.
It also enables the users to immediately observe the root cause of an issue when something goes wrong.
arXiv  Detail & Related papers  (2021-10-13T20:56:05Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv  Detail & Related papers  (2021-07-05T12:44:39Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv  Detail & Related papers  (2021-06-14T11:42:46Z) - Transfer Learning without Knowing: Reprogramming Black-box Machine
  Learning Models with Scarce Data and Limited Resources [78.72922528736011]
We propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box machine learning model.
Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses.
BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method.
arXiv  Detail & Related papers  (2020-07-17T01:52:34Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.