Assessing Scientific Contributions in Data Sharing Spaces
- URL: http://arxiv.org/abs/2303.10476v1
- Date: Sat, 18 Mar 2023 19:17:47 GMT
- Title: Assessing Scientific Contributions in Data Sharing Spaces
- Authors: Kacy Adams and Fernando Spadea and Conor Flynn and Oshani Seneviratne
- Abstract summary: This paper introduces the SCIENCE-index, a blockchain-based metric measuring a researcher's scientific contributions.
To incentivize researchers to share their data, the SCIENCE-index is augmented to include a data-sharing parameter.
Our model is evaluated by comparing the distribution of its output for geographically diverse researchers to that of the h-index.
- Score: 64.16762375635842
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the present academic landscape, the process of collecting data is slow,
and the lax infrastructures for data collaborations lead to significant delays
in coming up with and disseminating conclusive findings. Therefore, there is an
increasing need for a secure, scalable, and trustworthy data-sharing ecosystem
that promotes and rewards collaborative data-sharing efforts among researchers,
and a robust incentive mechanism is required to achieve this objective.
Reputation-based incentives, such as the h-index, have historically played a
pivotal role in the academic community. However, the h-index suffers from
several limitations. This paper introduces the SCIENCE-index, a
blockchain-based metric measuring a researcher's scientific contributions.
Utilizing the Microsoft Academic Graph and machine learning techniques, the
SCIENCE-index predicts the progress made by a researcher over their career and
provides a soft incentive for sharing their datasets with peer researchers. To
incentivize researchers to share their data, the SCIENCE-index is augmented to
include a data-sharing parameter. DataCite, a database of openly available
datasets, proxies this parameter, which is further enhanced by including a
researcher's data-sharing activity. Our model is evaluated by comparing the
distribution of its output for geographically diverse researchers to that of
the h-index. We observe that it results in a much more even spread of
evaluations. The SCIENCE-index is a crucial component in constructing a
decentralized protocol that promotes trust-based data sharing, addressing the
current inequity in dataset sharing. The work outlined in this paper provides
the foundation for assessing scientific contributions in future data-sharing
spaces powered by decentralized applications.
Related papers
- Pennsieve: A Collaborative Platform for Translational Neuroscience and Beyond [0.5130659559809153]
Pennsieve is an open-source, cloud-based scientific data management platform.
It supports complex multimodal datasets and provides tools for data visualization and analyses.
Pennsieve stores over 125 TB of scientific data, with 35 TB of data publicly available across more than 350 high-impact datasets.
arXiv Detail & Related papers (2024-09-16T17:55:58Z) - Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs)
We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs.
We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z) - DataPerf: Benchmarks for Data-Centric AI Development [81.03754002516862]
DataPerf is a community-led benchmark suite for evaluating ML datasets and data-centric algorithms.
We provide an open, online platform with multiple rounds of challenges to support this iterative development.
The benchmarks, online evaluation platform, and baseline implementations are open source.
arXiv Detail & Related papers (2022-07-20T17:47:54Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Subdivisions and Crossroads: Identifying Hidden Community Structures in
a Data Archive's Citation Network [1.6631602844999724]
This paper analyzes the community structure of an authoritative network of datasets cited in academic publications.
We identify communities of social science datasets and fields of research connected through shared data use.
Our research reveals the hidden structure of data reuse and demonstrates how interdisciplinary research communities organize around datasets as shared scientific inputs.
arXiv Detail & Related papers (2022-05-17T14:18:49Z) - DeepShovel: An Online Collaborative Platform for Data Extraction in
Geoscience Literature with AI Assistance [48.55345030503826]
Geoscientists need to read a huge amount of literature to locate, extract, and aggregate relevant results and data.
DeepShovel is a publicly-available AI-assisted data extraction system to support their needs.
A follow-up user evaluation with 14 researchers suggested DeepShovel improved users' efficiency of data extraction for building scientific databases.
arXiv Detail & Related papers (2022-02-21T12:18:08Z) - Yes-Yes-Yes: Donation-based Peer Reviewing Data Collection for ACL
Rolling Review and Beyond [58.71736531356398]
We present an in-depth discussion of peer reviewing data, outline the ethical and legal desiderata for peer reviewing data collection, and propose the first continuous, donation-based data collection workflow.
We report on the ongoing implementation of this workflow at the ACL Rolling Review and deliver the first insights obtained with the newly collected data.
arXiv Detail & Related papers (2022-01-27T11:02:43Z) - Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning
Research [3.536605202672355]
We study how dataset usage patterns differ across machine learning subcommunities and across time from 2015-2020.
We find increasing concentration on fewer and fewer datasets within task communities, significant adoption of datasets from other tasks, and concentration across the field on datasets that have been introduced by researchers situated within a small number of elite institutions.
arXiv Detail & Related papers (2021-12-03T05:01:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.