Data Quality in Edge Machine Learning: A State-of-the-Art Survey
- URL: http://arxiv.org/abs/2406.02600v1
- Date: Sat, 1 Jun 2024 23:07:05 GMT
- Title: Data Quality in Edge Machine Learning: A State-of-the-Art Survey
- Authors: Mohammed Djameleddine Belgoumri, Mohamed Reda Bouadjenek, Sunil Aryal, Hakim Hacid,
- Abstract summary: Data-driven Artificial Intelligence (AI) systems trained using Machine Learning (ML) are shaping an ever-increasing portion of our lives.
On the one hand, the outsized influence of these systems imposes a high standard of quality, particularly in the data used to train them.
On the other hand, establishing and maintaining standards of Data Quality (DQ) becomes more challenging due to the proliferation of Edge Computing and Internet of Things devices.
- Score: 2.8449839307925955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data-driven Artificial Intelligence (AI) systems trained using Machine Learning (ML) are shaping an ever-increasing (in size and importance) portion of our lives, including, but not limited to, recommendation systems, autonomous driving technologies, healthcare diagnostics, financial services, and personalized marketing. On the one hand, the outsized influence of these systems imposes a high standard of quality, particularly in the data used to train them. On the other hand, establishing and maintaining standards of Data Quality (DQ) becomes more challenging due to the proliferation of Edge Computing and Internet of Things devices, along with their increasing adoption for training and deploying ML models. The nature of the edge environment -- characterized by limited resources, decentralized data storage, and processing -- exacerbates data-related issues, making them more frequent, severe, and difficult to detect and mitigate. From these observations, it follows that DQ research for edge ML is a critical and urgent exploration track for the safety and robust usefulness of present and future AI systems. Despite this fact, DQ research for edge ML is still in its infancy. The literature on this subject remains fragmented and scattered across different research communities, with no comprehensive survey to date. Hence, this paper aims to fill this gap by providing a global view of the existing literature from multiple disciplines that can be grouped under the umbrella of DQ for edge ML. Specifically, we present a tentative definition of data quality in Edge computing, which we use to establish a set of DQ dimensions. We explore each dimension in detail, including existing solutions for mitigation.
Related papers
- A Theoretical Framework for AI-driven data quality monitoring in high-volume data environments [1.2753215270475886]
This paper presents a theoretical framework for an AI-driven data quality monitoring system designed to address the challenges of maintaining data quality in high-volume environments.
We examine the limitations of traditional methods in managing the scale, velocity, and variety of big data and propose a conceptual approach leveraging advanced machine learning techniques.
Key components include an intelligent data ingestion layer, adaptive preprocessing mechanisms, context-aware feature extraction, and AI-based quality assessment modules.
arXiv Detail & Related papers (2024-10-11T07:06:36Z) - A Survey on Data Quality Dimensions and Tools for Machine Learning [1.6078134198754157]
Machine learning (ML) technologies have become substantial in practically all aspects of our society.
Traditional methods like exploratory data analysis (EDA) and cross-validation (CV) face challenges in data-centric AI.
In this survey, we review 17 DQ evaluation and improvement tools in the last 5 years.
arXiv Detail & Related papers (2024-06-28T02:41:33Z) - AI-Driven Frameworks for Enhancing Data Quality in Big Data Ecosystems: Error_Detection, Correction, and Metadata Integration [0.0]
This thesis proposes a novel set of interconnected frameworks aimed at enhancing big data quality comprehensively.
Firstly, we introduce new quality metrics and a weighted scoring system for precise data quality assessment.
Thirdly, we present a generic framework for detecting various quality anomalies using AI models.
arXiv Detail & Related papers (2024-05-06T21:36:45Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Deep Transfer Learning for Automatic Speech Recognition: Towards Better
Generalization [3.6393183544320236]
Speech recognition has become an important challenge when using deep learning (DL)
It requires large-scale training datasets and high computational and storage resources.
Deep transfer learning (DTL) has been introduced to overcome these issues.
arXiv Detail & Related papers (2023-04-27T21:08:05Z) - Privacy Adhering Machine Un-learning in NLP [66.17039929803933]
In real world industry use Machine Learning to build models on user data.
Such mandates require effort both in terms of data as well as model retraining.
continuous removal of data and model retraining steps do not scale.
We propose textitMachine Unlearning to tackle this challenge.
arXiv Detail & Related papers (2022-12-19T16:06:45Z) - A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems.
ML models often remember' the old data.
Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z) - Federated Edge Learning : Design Issues and Challenges [1.916348196696894]
Federated Learning (FL) is a distributed machine learning technique, where each device contributes to the learning model by independently computing the gradient based on its local training data.
implementing FL at the network edge is challenging due to system and data heterogeneity and resources constraints.
This article proposes a general framework for the data-aware scheduling as a guideline for future research directions.
arXiv Detail & Related papers (2020-08-31T19:56:36Z) - Data Mining with Big Data in Intrusion Detection Systems: A Systematic
Literature Review [68.15472610671748]
Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation.
The rapid rate and volume of data creation has begun to pose significant challenges for data management and security.
The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance.
arXiv Detail & Related papers (2020-05-23T20:57:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.