On the Use of Interpretable Machine Learning for the Management of Data
Quality
- URL: http://arxiv.org/abs/2007.14677v1
- Date: Wed, 29 Jul 2020 08:49:32 GMT
- Title: On the Use of Interpretable Machine Learning for the Management of Data
Quality
- Authors: Anna Karanika, Panagiotis Oikonomou, Kostas Kolomvatsos, Christos
Anagnostopoulos
- Abstract summary: We propose the use of interpretable machine learning to deliver the features that are important to be based for any data processing activity.
Our aim is to secure data quality, at least, for those features that are detected as significant in the collected datasets.
- Score: 13.075880857448059
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data quality is a significant issue for any application that requests for
analytics to support decision making. It becomes very important when we focus
on Internet of Things (IoT) where numerous devices can interact to exchange and
process data. IoT devices are connected to Edge Computing (EC) nodes to report
the collected data, thus, we have to secure data quality not only at the IoT
but also at the edge of the network. In this paper, we focus on the specific
problem and propose the use of interpretable machine learning to deliver the
features that are important to be based for any data processing activity. Our
aim is to secure data quality, at least, for those features that are detected
as significant in the collected datasets. We have to notice that the selected
features depict the highest correlation with the remaining in every dataset,
thus, they can be adopted for dimensionality reduction. We focus on multiple
methodologies for having interpretability in our learning models and adopt an
ensemble scheme for the final decision. Our scheme is capable of timely
retrieving the final result and efficiently select the appropriate features. We
evaluate our model through extensive simulations and present numerical results.
Our aim is to reveal its performance under various experimental scenarios that
we create varying a set of parameters adopted in our mechanism.
Related papers
- Speech Emotion Recognition under Resource Constraints with Data Distillation [64.36799373890916]
Speech emotion recognition (SER) plays a crucial role in human-computer interaction.
The emergence of edge devices in the Internet of Things presents challenges in constructing intricate deep learning models.
We propose a data distillation framework to facilitate efficient development of SER models in IoT applications.
arXiv Detail & Related papers (2024-06-21T13:10:46Z) - Efficient Network Traffic Feature Sets for IoT Intrusion Detection [0.0]
This work evaluates the feature sets provided by a combination of different feature selection methods, namely Information Gain, Chi-Squared Test, Recursive Feature Elimination, Mean Absolute Deviation, and Dispersion Ratio, in multiple IoT network datasets.
The influence of the smaller feature sets on both the classification performance and the training time of ML models is compared, with the aim of increasing the computational efficiency of IoT intrusion detection.
arXiv Detail & Related papers (2024-06-12T09:51:29Z) - Energy-Efficient Edge Learning via Joint Data Deepening-and-Prefetching [9.468399367975984]
We propose a novel offloading architecture called joint data deepening-and-prefetching (JD2P)
JD2P is feature-by-feature offloading comprising two key techniques.
We evaluate the effectiveness of JD2P through experiments using the MNIST dataset.
arXiv Detail & Related papers (2024-02-19T08:12:47Z) - LESS: Selecting Influential Data for Targeted Instruction Tuning [64.78894228923619]
We propose LESS, an efficient algorithm to estimate data influences and perform Low-rank gradiEnt Similarity Search for instruction data selection.
We show that training on a LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks.
Our method goes beyond surface form cues to identify data that the necessary reasoning skills for the intended downstream application.
arXiv Detail & Related papers (2024-02-06T19:18:04Z) - Ensemble Learning based Anomaly Detection for IoT Cybersecurity via
Bayesian Hyperparameters Sensitivity Analysis [2.3226893628361682]
Internet of Things (IoT) integrates more than billions of intelligent devices over the globe with the capability of communicating with other connected devices.
Data collected by IoT contain a tremendous amount of information for anomaly detection.
In this paper, we present a study on using ensemble machine learning methods for enhancing IoT cybersecurity via anomaly detection.
arXiv Detail & Related papers (2023-07-20T05:23:49Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic
Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis.
The proposed dataset contains 100,000 images and 25 different object types.
We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z) - Federated Feature Selection for Cyber-Physical Systems of Systems [0.3609538870261841]
A fleet of autonomous vehicles finds a consensus on the optimal set of features that they exploit to reduce data transmission up to 99% with negligible information loss.
Our results show that a fleet of autonomous vehicles finds a consensus on the optimal set of features that they exploit to reduce data transmission up to 99% with negligible information loss.
arXiv Detail & Related papers (2021-09-23T12:16:50Z) - Feature Extraction for Machine Learning-based Intrusion Detection in IoT
Networks [6.6147550436077776]
This paper aims to discover whether Feature Reduction (FR) and Machine Learning (ML) techniques can be generalised across various datasets.
The detection accuracy of three Feature Extraction (FE) algorithms; Principal Component Analysis (PCA), Auto-encoder (AE), and Linear Discriminant Analysis (LDA) is evaluated.
arXiv Detail & Related papers (2021-08-28T23:52:18Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.