Automating the Identification of High-Value Datasets in Open Government Data Portals
- URL: http://arxiv.org/abs/2406.10541v1
- Date: Sat, 15 Jun 2024 07:54:37 GMT
- Title: Automating the Identification of High-Value Datasets in Open Government Data Portals
- Authors: Alfonso Quarati, Anastasija Nikiforova,
- Abstract summary: High-Value datasets (HVDs) play a crucial role in the broader Open Government Data (OGD) movement.
Identifying HVDs on OGD portals presents a resource-intensive and complex challenge due to the nuanced nature of data value.
Our proposal aims to automate the identification of HVDs on OGD portals using a quantitative approach based on a detailed analysis of user interest.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recognized for fostering innovation and transparency, driving economic growth, enhancing public services, supporting research, empowering citizens, and promoting environmental sustainability, High-Value Datasets (HVD) play a crucial role in the broader Open Government Data (OGD) movement. However, identifying HVD presents a resource-intensive and complex challenge due to the nuanced nature of data value. Our proposal aims to automate the identification of HVDs on OGD portals using a quantitative approach based on a detailed analysis of user interest derived from data usage statistics, thereby minimizing the need for human intervention. The proposed method involves extracting download data, analyzing metrics to identify high-value categories, and comparing HVD datasets across different portals. This automated process provides valuable insights into trends in dataset usage, reflecting citizens' needs and preferences. The effectiveness of our approach is demonstrated through its application to a sample of US OGD city portals. The practical implications of this study include contributing to the understanding of HVD at both local and national levels. By providing a systematic and efficient means of identifying HVD, our approach aims to inform open governance initiatives and practices, aiding OGD portal managers and public authorities in their efforts to optimize data dissemination and utilization.
Related papers
- Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy.
Within our study, we conducted expert interviews to gain insights into practices in the field.
We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - From an Integrated Usability Framework to Lessons on Usability and Performance of Open Government Data Portals: A Comparative Study of European Union and Gulf Cooperation Council Countries [0.0]
This study proposes an integrated usability framework for evaluating Open Government Data (OGD) portals.
The framework is developed and applied to 33 OGD portals from the European Union (EU) and Gulf Cooperation Council (GCC) countries.
arXiv Detail & Related papers (2024-06-13T03:05:36Z) - Unlocking the Potential of Open Government Data: Exploring the Strategic, Technical, and Application Perspectives of High-Value Datasets Opening in Taiwan [0.0]
The aim of the paper is to understand and evaluate the lifecycle of high-value dataset publishing in one of the world's leading producers of information and communication technology (ICT) products - Taiwan.
arXiv Detail & Related papers (2024-03-14T09:31:20Z) - An Integrated Usability Framework for Evaluating Open Government Data
Portals: Comparative Analysis of EU and GCC Countries [0.0]
This study explores the critical role of open government data (OGD) portals in fostering transparency and collaboration between diverse stakeholders.
Recognizing the challenges of usability, communication with diverse populations, and strategic value creation, this paper develops an integrated framework for evaluating OGD portal effectiveness.
arXiv Detail & Related papers (2024-03-13T12:06:42Z) - Transcending Traditional Boundaries: Leveraging Inter-Annotator
Agreement (IAA) for Enhancing Data Management Operations (DMOps) [4.413246337852144]
We advocate for the use of IAA in predicting the labeling quality of individual annotators, leading to cost and time efficiency in data production.
This research underscores IAA's broader application potential in data-driven research optimization.
arXiv Detail & Related papers (2023-06-26T01:33:58Z) - Your Room is not Private: Gradient Inversion Attack on Reinforcement
Learning [47.96266341738642]
Privacy emerges as a pivotal concern within the realm of embodied AI, as the robot accesses substantial personal information.
This paper proposes an attack on the value-based algorithm and the gradient-based algorithm, utilizing gradient inversion to reconstruct states, actions, and supervision signals.
arXiv Detail & Related papers (2023-06-15T16:53:26Z) - Towards High-Value Datasets determination for data-driven development: a
systematic literature review [0.0]
'High-value dataset' (HVD) recognized as a key trend in the Open Data Directive area in 2022.
There is no standardized approach to assist chief data officers in this.
arXiv Detail & Related papers (2023-05-17T14:22:02Z) - Gradient Imitation Reinforcement Learning for General Low-Resource
Information Extraction [80.64518530825801]
We develop a Gradient Reinforcement Learning (GIRL) method to encourage pseudo-labeled data to imitate the gradient descent direction on labeled data.
We also leverage GIRL to solve all IE sub-tasks (named entity recognition, relation extraction, and event extraction) in low-resource settings.
arXiv Detail & Related papers (2022-11-11T05:37:19Z) - Explainable Patterns: Going from Findings to Insights to Support Data
Analytics Democratization [60.18814584837969]
We present Explainable Patterns (ExPatt), a new framework to support lay users in exploring and creating data storytellings.
ExPatt automatically generates plausible explanations for observed or selected findings using an external (textual) source of information.
arXiv Detail & Related papers (2021-01-19T16:13:44Z) - Towards Inheritable Models for Open-Set Domain Adaptation [56.930641754944915]
We introduce a practical Domain Adaptation paradigm where a source-trained model is used to facilitate adaptation in the absence of the source dataset in future.
We present an objective way to quantify inheritability to enable the selection of the most suitable source model for a given target domain, even in the absence of the source data.
arXiv Detail & Related papers (2020-04-09T07:16:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.