Methods to Assess the UK Government's Current Role as a Data Provider for AI
- URL: http://arxiv.org/abs/2412.09632v2
- Date: Wed, 18 Dec 2024 15:55:28 GMT
- Title: Methods to Assess the UK Government's Current Role as a Data Provider for AI
- Authors: Neil Majithia, Elena Simperl,
- Abstract summary: This paper serves as a technical report, explaining in-depth the designs, mechanics, and limitations of the above experiments.<n>It is accompanied by a complementary non-technical report on the ODI website in which we summarise the experiments and key findings, interpret them, and build a set of actionable recommendations for the UK government to take forward as it seeks to design AI policy.
- Score: 2.9712266483979346
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Governments typically collect and steward a vast amount of high-quality data on their citizens and institutions, and the UK government is exploring how it can better publish and provision this data to the benefit of the AI landscape. However, the compositions of generative AI training corpora remain closely guarded secrets, making the planning of data sharing initiatives difficult. To address this, we devise two methods to assess UK government data usage for the training of Large Language Models (LLMs) and 'peek behind the curtain' in order to observe the UK government's current contributions as a data provider for AI. The first method, an ablation study that utilises LLM 'unlearning', seeks to examine the importance of the information held on UK government websites for LLMs and their performance in citizen query tasks. The second method, an information leakage study, seeks to ascertain whether LLMs are aware of the information held in the datasets published on the UK government's open data initiative data.gov.uk. Our findings indicate that UK government websites are important data sources for AI (heterogenously across subject matters) while data.gov.uk is not. This paper serves as a technical report, explaining in-depth the designs, mechanics, and limitations of the above experiments. It is accompanied by a complementary non-technical report on the ODI website in which we summarise the experiments and key findings, interpret them, and build a set of actionable recommendations for the UK government to take forward as it seeks to design AI policy. While we focus on UK open government data, we believe that the methods introduced in this paper present a reproducible approach to tackle the opaqueness of AI training corpora and provide organisations a framework to evaluate and maximize their contributions to AI development.
Related papers
- Do Responsible AI Artifacts Advance Stakeholder Goals? Four Key Barriers Perceived by Legal and Civil Stakeholders [59.17981603969404]
The responsible AI (RAI) community has introduced numerous processes and artifacts to facilitate transparency and support the governance of AI systems.
We conduct semi-structured interviews with 19 government, legal, and civil society stakeholders who inform policy and advocacy around responsible AI efforts.
We organize these beliefs into four barriers that help explain how RAI artifacts may (inadvertently) reconfigure power relations across civil society, government, and industry.
arXiv Detail & Related papers (2024-08-22T00:14:37Z) - Future and AI-Ready Data Strategies: Response to DOC RFI on AI and Open Government Data Assets [6.659894897434807]
The following is a response to the US Department of Commerce's Request for Information (RFI) regarding AI and Open Government Data Assets.<n>We commend the Department for its initiative in seeking public insights on the organization and sharing of data.<n>In our response, we outline best practices and key considerations for AI and the Department of Commerce's Open Government Data Assets.
arXiv Detail & Related papers (2024-07-26T07:31:32Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - Federated Learning Priorities Under the European Union Artificial
Intelligence Act [68.44894319552114]
We perform a first-of-its-kind interdisciplinary analysis (legal and ML) of the impact the AI Act may have on Federated Learning.
We explore data governance issues and the concern for privacy.
Most noteworthy are the opportunities to defend against data bias and enhance private and secure computation.
arXiv Detail & Related papers (2024-02-05T19:52:19Z) - Trust, Accountability, and Autonomy in Knowledge Graph-based AI for
Self-determination [1.4305544869388402]
Knowledge Graphs (KGs) have emerged as fundamental platforms for powering intelligent decision-making.
The integration of KGs with neuronal learning is currently a topic of active research.
This paper conceptualises the foundational topics and research pillars to support KG-based AI for self-determination.
arXiv Detail & Related papers (2023-10-30T12:51:52Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data.
FL and related techniques are often described as privacy-preserving.
We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z) - Explainable Patterns: Going from Findings to Insights to Support Data
Analytics Democratization [60.18814584837969]
We present Explainable Patterns (ExPatt), a new framework to support lay users in exploring and creating data storytellings.
ExPatt automatically generates plausible explanations for observed or selected findings using an external (textual) source of information.
arXiv Detail & Related papers (2021-01-19T16:13:44Z) - AI Governance for Businesses [2.072259480917207]
It aims at leveraging AI through effective use of data and minimization of AI-related cost and risk.
This work views AI products as systems, where key functionality is delivered by machine learning (ML) models leveraging (training) data.
Our framework decomposes AI governance into governance of data, (ML) models and (AI) systems along four dimensions.
arXiv Detail & Related papers (2020-11-20T22:31:37Z) - Montreal AI Ethics Institute's Response to Scotland's AI Strategy [0.0]
In January and February 2020, the Scottish Government released two documents for review by the public regarding their artificial intelligence (AI) strategy.
The Montreal AI Ethics Institute (MAIEI) reviewed these documents and published a response on 4 June 2020.
arXiv Detail & Related papers (2020-06-11T10:08:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.