Open Data Quality Evaluation: A Comparative Analysis of Open Data in
Latvia
- URL: http://arxiv.org/abs/2007.04697v2
- Date: Wed, 15 Jun 2022 08:10:51 GMT
- Title: Open Data Quality Evaluation: A Comparative Analysis of Open Data in
Latvia
- Authors: Anastasija Nikiforova
- Abstract summary: The research discusses how (open) data quality could be assessed.
One specific approach is applied to several Latvian open data sets.
There are also underlined common data quality problems detected in Latvian open data and in open data of 3 European countries.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Nowadays open data is entering the mainstream - it is free available for
every stakeholder and is often used in business decision-making. It is
important to be sure data is trustable and error-free as its quality problems
can lead to huge losses. The research discusses how (open) data quality could
be assessed. It also covers main points which should be considered developing a
data quality management solution. One specific approach is applied to several
Latvian open data sets. The research provides a step-by-step open data sets
analysis guide and summarizes its results. It is also shown there could exist
differences in data quality depending on data supplier (centralized and
decentralized data releases) and, unfortunately, trustable data supplier cannot
guarantee data quality problems absence. There are also underlined common data
quality problems detected not only in Latvian open data but also in open data
of 3 European countries.
Related papers
- Principles for Open Data Curation: A Case Study with the New York City 311 Service Request Data [2.3464946883680864]
The City of New York (NYC) has been at the forefront of this movement since the enactment of the Open Data Law in 2012.
The portal currently hosts 2,700 datasets, serving as a crucial resource for research across various domains.
The effective use of open data relies heavily on data quality and usability, challenges that remain insufficiently addressed in the literature.
arXiv Detail & Related papers (2025-01-14T12:06:20Z) - A Survey on Data Markets [73.07800441775814]
Growing trend of trading data for greater welfare has led to the emergence of data markets.
A data market is any mechanism whereby the exchange of data products including datasets and data derivatives takes place.
It serves as a coordinating mechanism by which several functions, including the pricing and the distribution of data, interact.
arXiv Detail & Related papers (2024-11-09T15:09:24Z) - A Guide to Misinformation Detection Datasets [5.673951146506489]
This guide aims to provide a roadmap for obtaining higher quality data and conducting more effective evaluations.
All datasets and other artifacts are available at https://misinfo-datasets.complexdatalab.com/.
arXiv Detail & Related papers (2024-11-07T18:47:39Z) - AI-Driven Frameworks for Enhancing Data Quality in Big Data Ecosystems: Error_Detection, Correction, and Metadata Integration [0.0]
This thesis proposes a novel set of interconnected frameworks aimed at enhancing big data quality comprehensively.
Firstly, we introduce new quality metrics and a weighted scoring system for precise data quality assessment.
Thirdly, we present a generic framework for detecting various quality anomalies using AI models.
arXiv Detail & Related papers (2024-05-06T21:36:45Z) - Enhancing Data Quality in Federated Fine-Tuning of Foundation Models [54.757324343062734]
We propose a data quality control pipeline for federated fine-tuning of foundation models.
This pipeline computes scores reflecting the quality of training data and determines a global threshold for a unified standard.
Our experiments show that the proposed quality control pipeline facilitates the effectiveness and reliability of the model training, leading to better performance.
arXiv Detail & Related papers (2024-03-07T14:28:04Z) - Data Acquisition: A New Frontier in Data-centric AI [65.90972015426274]
We first present an investigation of current data marketplaces, revealing lack of platforms offering detailed information about datasets.
We then introduce the DAM challenge, a benchmark to model the interaction between the data providers and acquirers.
Our evaluation of the submitted strategies underlines the need for effective data acquisition strategies in Machine Learning.
arXiv Detail & Related papers (2023-11-22T22:15:17Z) - QI2 -- an Interactive Tool for Data Quality Assurance [63.379471124899915]
The planned AI Act from the European commission defines challenging legal requirements for data quality.
We introduce a novel approach that supports the data quality assurance process of multiple data quality aspects.
arXiv Detail & Related papers (2023-07-07T07:06:38Z) - Stop Uploading Test Data in Plain Text: Practical Strategies for
Mitigating Data Contamination by Evaluation Benchmarks [70.39633252935445]
Data contamination has become prevalent and challenging with the rise of models pretrained on large automatically-crawled corpora.
For closed models, the training data becomes a trade secret, and even for open models, it is not trivial to detect contamination.
We propose three strategies that can make a difference: (1) Test data made public should be encrypted with a public key and licensed to disallow derivative distribution; (2) demand training exclusion controls from closed API holders, and protect your test data by refusing to evaluate without them; and (3) avoid data which appears with its solution on the internet, and release the web-page context of internet-derived
arXiv Detail & Related papers (2023-05-17T12:23:38Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - Detecting Quality Problems in Data Models by Clustering Heterogeneous
Data Values [1.143020642249583]
We propose a bottom-up approach to detecting quality problems in data models that manifest in heterogeneous data values.
All values of a selected data field are clustered by syntactic similarity.
It shall help domain experts to understand how the data model is used in practice and to derive potential quality problems of the data model.
arXiv Detail & Related papers (2021-11-12T11:05:18Z) - Open Data Quality [0.0]
The proposed approach is applied to several open data sets to evaluate their quality.
It is important to be sure that this data is trustable and error-free as its quality problems can lead to huge losses.
arXiv Detail & Related papers (2020-07-09T11:10:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.