ECS -- an Interactive Tool for Data Quality Assurance
- URL: http://arxiv.org/abs/2307.04368v2
- Date: Mon, 17 Jul 2023 05:34:47 GMT
- Title: ECS -- an Interactive Tool for Data Quality Assurance
- Authors: Christian Sieberichs, Simon Geerkens, Alexander Braun, Thomas
Waschulzik
- Abstract summary: We present a novel approach for the assurance of data quality.
For this purpose, the mathematical basics are first discussed and the approach is presented using multiple examples.
This results in the detection of data points with potentially harmful properties for the use in safety-critical systems.
- Score: 63.379471124899915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing capabilities of machine learning systems and their
potential use in safety-critical systems, ensuring high-quality data is
becoming increasingly important. In this paper we present a novel approach for
the assurance of data quality. For this purpose, the mathematical basics are
first discussed and the approach is presented using multiple examples. This
results in the detection of data points with potentially harmful properties for
the use in safety-critical systems.
Related papers
- Semi-Supervised Multi-Task Learning Based Framework for Power System Security Assessment [0.0]
This paper develops a novel machine learning-based framework using Semi-Supervised Multi-Task Learning (SS-MTL) for power system dynamic security assessment.
The learning algorithm underlying the proposed framework integrates conditional masked encoders and employs multi-task learning for classification-aware feature representation.
Various experiments on the IEEE 68-bus system were conducted to validate the proposed method.
arXiv Detail & Related papers (2024-07-11T22:42:53Z) - Machine Learning and Feature Ranking for Impact Fall Detection Event
Using Multisensor Data [1.9731252964716424]
We employ a feature selection process to identify the most relevant features from the multisensor UP-FALL dataset.
We then evaluate the efficiency of various machine learning models in detecting the impact moment.
Our results achieve high accuracy rates in impact detection, showcasing the power of leveraging multisensor data for fall detection tasks.
arXiv Detail & Related papers (2023-12-21T01:05:44Z) - QI2 -- an Interactive Tool for Data Quality Assurance [63.379471124899915]
The planned AI Act from the European commission defines challenging legal requirements for data quality.
We introduce a novel approach that supports the data quality assurance process of multiple data quality aspects.
arXiv Detail & Related papers (2023-07-07T07:06:38Z) - Towards Generalizable Data Protection With Transferable Unlearnable
Examples [50.628011208660645]
We present a novel, generalizable data protection method by generating transferable unlearnable examples.
To the best of our knowledge, this is the first solution that examines data privacy from the perspective of data distribution.
arXiv Detail & Related papers (2023-05-18T04:17:01Z) - Multi Agent System for Machine Learning Under Uncertainty in Cyber
Physical Manufacturing System [78.60415450507706]
Recent advancements in predictive machine learning has led to its application in various use cases in manufacturing.
Most research focused on maximising predictive accuracy without addressing the uncertainty associated with it.
In this paper, we determine the sources of uncertainty in machine learning and establish the success criteria of a machine learning system to function well under uncertainty.
arXiv Detail & Related papers (2021-07-28T10:28:05Z) - Data Curation and Quality Assurance for Machine Learning-based Cyber
Intrusion Detection [1.0276024900942873]
This article first summarizes existing machine learning-based intrusion detection systems and the datasets used for building these systems.
The experimental results show that BERT and GPT were the best algorithms for HIDS on all of the datasets.
We then evaluate the data quality of the 11 datasets based on quality dimensions proposed in this paper to determine the best characteristics that a HIDS dataset should possess in order to yield the best possible result.
arXiv Detail & Related papers (2021-05-20T21:31:46Z) - Data Quality Measures and Efficient Evaluation Algorithms for
Large-Scale High-Dimensional Data [0.15229257192293197]
We propose two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset.
We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data.
arXiv Detail & Related papers (2021-01-05T10:23:08Z) - Dos and Don'ts of Machine Learning in Computer Security [74.1816306998445]
Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance.
We identify common pitfalls in the design, implementation, and evaluation of learning-based security systems.
We propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible.
arXiv Detail & Related papers (2020-10-19T13:09:31Z) - Data Mining with Big Data in Intrusion Detection Systems: A Systematic
Literature Review [68.15472610671748]
Cloud computing has become a powerful and indispensable technology for complex, high performance and scalable computation.
The rapid rate and volume of data creation has begun to pose significant challenges for data management and security.
The design and deployment of intrusion detection systems (IDS) in the big data setting has, therefore, become a topic of importance.
arXiv Detail & Related papers (2020-05-23T20:57:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.