Subdivisions and Crossroads: Identifying Hidden Community Structures in
a Data Archive's Citation Network
- URL: http://arxiv.org/abs/2205.08395v1
- Date: Tue, 17 May 2022 14:18:49 GMT
- Title: Subdivisions and Crossroads: Identifying Hidden Community Structures in
a Data Archive's Citation Network
- Authors: Sara Lafia, Lizhou Fan, Andrea Thomer, Libby Hemphill
- Abstract summary: This paper analyzes the community structure of an authoritative network of datasets cited in academic publications.
We identify communities of social science datasets and fields of research connected through shared data use.
Our research reveals the hidden structure of data reuse and demonstrates how interdisciplinary research communities organize around datasets as shared scientific inputs.
- Score: 1.6631602844999724
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data archives are an important source of high quality data in many fields,
making them ideal sites to study data reuse. By studying data reuse through
citation networks, we are able to learn how hidden research communities - those
that use the same scientific datasets - are organized. This paper analyzes the
community structure of an authoritative network of datasets cited in academic
publications, which have been collected by a large, social science data
archive: the Interuniversity Consortium for Political and Social Research
(ICPSR). Through network analysis, we identified communities of social science
datasets and fields of research connected through shared data use. We argue
that communities of exclusive data reuse form subdivisions that contain
valuable disciplinary resources, while datasets at a "crossroads" broadly
connect research communities. Our research reveals the hidden structure of data
reuse and demonstrates how interdisciplinary research communities organize
around datasets as shared scientific inputs. These findings contribute new ways
of describing scientific communities in order to understand the impacts of
research data reuse.
Related papers
- The Landscape of Data Reuse in Interactive Information Retrieval: Motivations, Sources, and Evaluation of Reusability [5.257245308437576]
This study investigated the data reuse practices of experienced researchers from the area of Interactive Information Retrieval (IIR) studies.
We conducted 21 semi-structured in-depth interviews with IIR researchers from varying demographic backgrounds, institutions, and stages of careers on their motivations, experiences, and concerns over data reuse.
arXiv Detail & Related papers (2024-11-23T03:15:31Z) - SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [49.54155332262579]
We release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles.
Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations.
arXiv Detail & Related papers (2024-10-28T15:56:49Z) - From Data Creator to Data Reuser: Distance Matters [0.847136673632881]
Open science policies focus more heavily on data sharing than on reuse.
The value of data reuse lies in relationships between creators and reusers.
We develop the theoretical construct of distance between data creator and data reuser.
arXiv Detail & Related papers (2024-02-05T18:16:04Z) - Assessing Scientific Contributions in Data Sharing Spaces [64.16762375635842]
This paper introduces the SCIENCE-index, a blockchain-based metric measuring a researcher's scientific contributions.
To incentivize researchers to share their data, the SCIENCE-index is augmented to include a data-sharing parameter.
Our model is evaluated by comparing the distribution of its output for geographically diverse researchers to that of the h-index.
arXiv Detail & Related papers (2023-03-18T19:17:47Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Librarian-in-the-Loop: A Natural Language Processing Paradigm for
Detecting Informal Mentions of Research Data in Academic Literature [1.4190701053683017]
We propose a natural language processing paradigm to support the human task of identifying informal mentions made to research datasets.
The work of discovering informal data mentions is currently performed by librarians and their staff in the Inter-university Consortium for Political and Social Research.
arXiv Detail & Related papers (2022-03-10T02:11:30Z) - DeepShovel: An Online Collaborative Platform for Data Extraction in
Geoscience Literature with AI Assistance [48.55345030503826]
Geoscientists need to read a huge amount of literature to locate, extract, and aggregate relevant results and data.
DeepShovel is a publicly-available AI-assisted data extraction system to support their needs.
A follow-up user evaluation with 14 researchers suggested DeepShovel improved users' efficiency of data extraction for building scientific databases.
arXiv Detail & Related papers (2022-02-21T12:18:08Z) - Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning
Research [3.536605202672355]
We study how dataset usage patterns differ across machine learning subcommunities and across time from 2015-2020.
We find increasing concentration on fewer and fewer datasets within task communities, significant adoption of datasets from other tasks, and concentration across the field on datasets that have been introduced by researchers situated within a small number of elite institutions.
arXiv Detail & Related papers (2021-12-03T05:01:47Z) - A Comprehensive Survey on Community Detection with Deep Learning [93.40332347374712]
A community reveals the features and connections of its members that are different from those in other communities in a network.
This survey devises and proposes a new taxonomy covering different categories of the state-of-the-art methods.
The main category, i.e., deep neural networks, is further divided into convolutional networks, graph attention networks, generative adversarial networks and autoencoders.
arXiv Detail & Related papers (2021-05-26T14:37:07Z) - Evaluating the state-of-the-art in mapping research spaces: a Brazilian
case study [0.0]
Two recent works propose methods for creating research maps from scientists' publication records.
We evaluate these models' ability to predict whether a given entity will enter a new field.
We conduct a case study to showcase how these models can be used to characterize science dynamics in the context of Brazil.
arXiv Detail & Related papers (2021-04-07T18:14:41Z) - Deep Learning for Community Detection: Progress, Challenges and
Opportunities [79.26787486888549]
Article summarizes the contributions of the various frameworks, models, and algorithms in deep neural networks.
This article summarizes the contributions of the various frameworks, models, and algorithms in deep neural networks.
arXiv Detail & Related papers (2020-05-17T11:22:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.