Towards High-Value Datasets determination for data-driven development: a
systematic literature review
- URL: http://arxiv.org/abs/2305.10234v1
- Date: Wed, 17 May 2023 14:22:02 GMT
- Title: Towards High-Value Datasets determination for data-driven development: a
systematic literature review
- Authors: Anastasija Nikiforova, Nina Rizun, Magdalena Ciesielska, Charalampos
Alexopoulos, Andrea Mileti\v{c}
- Abstract summary: 'High-value dataset' (HVD) recognized as a key trend in the Open Data Directive area in 2022.
There is no standardized approach to assist chief data officers in this.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The OGD is seen as a political and socio-economic phenomenon that promises to
promote civic engagement and stimulate public sector innovations in various
areas of public life. To bring the expected benefits, data must be reused and
transformed into value-added products or services. This, in turn, sets another
precondition for data that are expected to not only be available and comply
with open data principles, but also be of value, i.e., of interest for reuse by
the end-user. This refers to the notion of 'high-value dataset' (HVD),
recognized by the European Data Portal as a key trend in the OGD area in 2022.
While there is a progress in this direction, e.g., the Open Data Directive,
incl. identifying 6 key categories, a list of HVDs and arrangements for their
publication and re-use, they can be seen as 'core' / 'base' datasets aimed at
increasing interoperability of public sector data with a high priority,
contributing to the development of a more mature OGD initiative. Depending on
the specifics of a region and country - geographical location, social,
environmental, economic issues, cultural characteristics, (under)developed
sectors and market specificities, more datasets can be recognized as of high
value for a particular country. However, there is no standardized approach to
assist chief data officers in this. In this paper, we present a systematic
review of existing literature on the HVD determination, which is expected to
form an initial knowledge base for this process, incl. used approaches and
indicators to determine them, data, stakeholders.
Related papers
- Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - Automating the Identification of High-Value Datasets in Open Government Data Portals [0.0]
High-Value datasets (HVDs) play a crucial role in the broader Open Government Data (OGD) movement.
Identifying HVDs on OGD portals presents a resource-intensive and complex challenge due to the nuanced nature of data value.
Our proposal aims to automate the identification of HVDs on OGD portals using a quantitative approach based on a detailed analysis of user interest.
arXiv Detail & Related papers (2024-06-15T07:54:37Z) - From an Integrated Usability Framework to Lessons on Usability and Performance of Open Government Data Portals: A Comparative Study of European Union and Gulf Cooperation Council Countries [0.0]
This study proposes an integrated usability framework for evaluating Open Government Data (OGD) portals.
The framework is developed and applied to 33 OGD portals from the European Union (EU) and Gulf Cooperation Council (GCC) countries.
arXiv Detail & Related papers (2024-06-13T03:05:36Z) - Exploring Estonia's Open Government Data Development as a Journey towards Excellence: Unveiling the Progress of Local Governments in Open Data Provision [0.0]
Estonia has a global reputation of a digital state or e-country.
Despite the success in digital governance, the country has faced challenges in the realm of Open Government Data (OGD)
This paper aims to explore the evolution and positioning of Estonia's OGD development, encompassing national and local levels.
arXiv Detail & Related papers (2024-03-18T16:50:05Z) - Unlocking the Potential of Open Government Data: Exploring the Strategic, Technical, and Application Perspectives of High-Value Datasets Opening in Taiwan [0.0]
The aim of the paper is to understand and evaluate the lifecycle of high-value dataset publishing in one of the world's leading producers of information and communication technology (ICT) products - Taiwan.
arXiv Detail & Related papers (2024-03-14T09:31:20Z) - When is Off-Policy Evaluation (Reward Modeling) Useful in Contextual Bandits? A Data-Centric Perspective [64.73162159837956]
evaluating the value of a hypothetical target policy with only a logged dataset is important but challenging.
We propose DataCOPE, a data-centric framework for evaluating a target policy given a dataset.
Our empirical analysis of DataCOPE in the logged contextual bandit settings using healthcare datasets confirms its ability to evaluate both machine-learning and human expert policies.
arXiv Detail & Related papers (2023-11-23T17:13:37Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Data Innovation in Demography, Migration and Human Mobility [0.0]
Data innovation has led to new challenges (ethics, privacy, data governance models, data quality) for citizens, statistical offices, policymakers and the private sector.
This study has reviewed more than 300 articles and scientific reports, as well as numerous tools, that employed non-traditional data sources to measure vital population events.
arXiv Detail & Related papers (2022-09-05T07:55:07Z) - DataPerf: Benchmarks for Data-Centric AI Development [81.03754002516862]
DataPerf is a community-led benchmark suite for evaluating ML datasets and data-centric algorithms.
We provide an open, online platform with multiple rounds of challenges to support this iterative development.
The benchmarks, online evaluation platform, and baseline implementations are open source.
arXiv Detail & Related papers (2022-07-20T17:47:54Z) - Domain Generalization: A Survey [146.68420112164577]
Domain generalization (DG) aims to achieve OOD generalization by only using source domain data for model learning.
For the first time, a comprehensive literature review is provided to summarize the ten-year development in DG.
arXiv Detail & Related papers (2021-03-03T16:12:22Z) - Towards Inheritable Models for Open-Set Domain Adaptation [56.930641754944915]
We introduce a practical Domain Adaptation paradigm where a source-trained model is used to facilitate adaptation in the absence of the source dataset in future.
We present an objective way to quantify inheritability to enable the selection of the most suitable source model for a given target domain, even in the absence of the source data.
arXiv Detail & Related papers (2020-04-09T07:16:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.