Reproducibility Needs Reshape Scientific Data Governance
- URL: http://arxiv.org/abs/2410.12800v1
- Date: Sun, 29 Sep 2024 22:13:19 GMT
- Title: Reproducibility Needs Reshape Scientific Data Governance
- Authors: Paul Meijer, Yousef Aggoune, Madeline Ambrose, Aldan Beaubien, James Harvey, Nicole Howard, Neelima Inala, Ed Johnson, Autumn Kelsey, Melissa Kinsey, Jessica Liang, Paul Mariz, Stark Pister, Sathya Subramanian, Vitalii Tereshchenko, Anne Vetto,
- Abstract summary: Data governance should prioritize maximizing the utility of data throughout the research lifecycle.
Proactive analysis and data governance are integral and interconnected components of research lifecycle management.
- Score: 0.0
- License:
- Abstract: Scientific data governance should prioritize maximizing the utility of data throughout the research lifecycle. Research software systems that enable analysis reproducibility inform data governance policies and assist administrators in setting clear guidelines for data reuse, data retention, and the management of scientific computing needs. Proactive analysis reproducibility and data governance are integral and interconnected components of research lifecycle management.
Related papers
- The Landscape of Data Reuse in Interactive Information Retrieval: Motivations, Sources, and Evaluation of Reusability [5.257245308437576]
This study investigated the data reuse practices of experienced researchers from the area of Interactive Information Retrieval (IIR) studies.
We conducted 21 semi-structured in-depth interviews with IIR researchers from varying demographic backgrounds, institutions, and stages of careers on their motivations, experiences, and concerns over data reuse.
arXiv Detail & Related papers (2024-11-23T03:15:31Z) - Continuous Analysis: Evolution of Software Engineering and Reproducibility for Science [0.0]
This paper introduces the concept of Continuous Analysis to address the challenges in scientific research.
By adopting CA, the scientific community can ensure the validity and generalizability of research outcomes.
arXiv Detail & Related papers (2024-11-04T17:11:08Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - Data Management For Training Large Language Models: A Survey [64.18200694790787]
Data plays a fundamental role in training Large Language Models (LLMs)
This survey aims to provide a comprehensive overview of current research in data management within both the pretraining and supervised fine-tuning stages of LLMs.
arXiv Detail & Related papers (2023-12-04T07:42:16Z) - Transforming Agriculture with Intelligent Data Management and Insights [3.027257459810039]
Modern agriculture faces grand challenges to meet increased demands for food, fuel, feed, and fiber under the constraints of climate change and dwindling natural resources.
Data innovation is urgently required to secure and improve the productivity, sustainability, and resilience of our agroecosystems.
arXiv Detail & Related papers (2023-11-07T22:02:54Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - Towards Data-centric Graph Machine Learning: Review and Outlook [120.64417630324378]
We introduce a systematic framework, Data-centric Graph Machine Learning (DC-GML), that encompasses all stages of the graph data lifecycle.
A thorough taxonomy of each stage is presented to answer three critical graph-centric questions.
We pinpoint the future prospects of the DC-GML domain, providing insights to navigate its advancements and applications.
arXiv Detail & Related papers (2023-09-20T00:40:13Z) - Mapping and Comparing Data Governance Frameworks: A benchmarking
exercise to inform global data governance deliberations [0.0]
Article explores the increasing importance of global data governance due to the rapid growth of data and the need for responsible data use and protection.
The report highlights the need for a more holistic, coordinated approach to data governance to manage the global flow of data responsibly and for the public interest.
arXiv Detail & Related papers (2023-02-27T12:56:25Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Nine Best Practices for Research Software Registries and Repositories: A
Concise Guide [63.52960372153386]
We present a set of nine best practices that can help managers define the scope, practices, and rules that govern individual registries and repositories.
These best practices were distilled from the experiences of the creators of existing resources, convened by a Task Force of the FORCE11 Software Implementation Working Group during the years 2011 and 2012.
arXiv Detail & Related papers (2020-12-24T05:37:54Z) - DataFed: Towards Reproducible Research via Federated Data Management [0.0]
DataFed is a lightweight, distributed scientific data management system.
It spans a federation of storage systems within a loosely-coupled network of scientific facilities.
arXiv Detail & Related papers (2020-04-07T21:05:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.