A dataset of mentorship in science with semantic and demographic
estimations
- URL: http://arxiv.org/abs/2106.06487v1
- Date: Fri, 11 Jun 2021 16:12:15 GMT
- Title: A dataset of mentorship in science with semantic and demographic
estimations
- Authors: Qing Ke, Lizhen Liang, Ying Ding, Stephen V. David, Daniel E. Acuna
- Abstract summary: We describe a crowdsourced dataset of 743176 mentorship relationships among 738989 scientists across 112 fields.
We enrich the scientists' profiles with publication data from the Microsoft Academic Graph and "semantic" representations of research using deep learning content analysis.
We perform extensive validations of the profile--publication matching, semantic content, and demographic inferences.
- Score: 4.317131795436002
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mentorship in science is crucial for topic choice, career decisions, and the
success of mentees and mentors. Typically, researchers who study mentorship use
article co-authorship and doctoral dissertation datasets. However, available
datasets of this type focus on narrow selections of fields and miss out on
early career and non-publication-related interactions. Here, we describe
MENTORSHIP, a crowdsourced dataset of 743176 mentorship relationships among
738989 scientists across 112 fields that avoids these shortcomings. We enrich
the scientists' profiles with publication data from the Microsoft Academic
Graph and "semantic" representations of research using deep learning content
analysis. Because gender and race have become critical dimensions when
analyzing mentorship and disparities in science, we also provide estimations of
these factors. We perform extensive validations of the profile--publication
matching, semantic content, and demographic inferences. We anticipate this
dataset will spur the study of mentorship in science and deepen our
understanding of its role in scientists' career outcomes.
Related papers
- SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - Getting aligned on representational alignment [89.81370730647467]
We study the study of representational alignment in cognitive science, neuroscience, and machine learning.
There is limited knowledge transfer between research communities interested in representational alignment.
We propose a unifying framework that can serve as a common language between researchers studying representational alignment.
arXiv Detail & Related papers (2023-10-18T17:47:58Z) - A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and
Why? [84.46288849132634]
We propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques.
We define three variables to encompass diverse facets of the evolution of research topics within NLP.
We utilize a causal discovery algorithm to unveil the causal connections among these variables using observational data.
arXiv Detail & Related papers (2023-05-22T11:08:00Z) - Assessing Scientific Contributions in Data Sharing Spaces [64.16762375635842]
This paper introduces the SCIENCE-index, a blockchain-based metric measuring a researcher's scientific contributions.
To incentivize researchers to share their data, the SCIENCE-index is augmented to include a data-sharing parameter.
Our model is evaluated by comparing the distribution of its output for geographically diverse researchers to that of the h-index.
arXiv Detail & Related papers (2023-03-18T19:17:47Z) - How Data Scientists Review the Scholarly Literature [4.406926847270567]
We examine the literature review practices of data scientists.
Data science represents a field seeing an exponential rise in papers.
No prior work has examined the specific practices and challenges faced by these scientists.
arXiv Detail & Related papers (2023-01-10T03:53:05Z) - SciTweets -- A Dataset and Annotation Framework for Detecting Scientific
Online Discourse [2.3371548697609303]
Scientific topics, claims and resources are increasingly debated as part of online discourse.
This has led to both significant societal impact and increased interest in scientific online discourse from various disciplines.
Research across disciplines currently suffers from a lack of robust definitions of the various forms of science-relatedness.
arXiv Detail & Related papers (2022-06-15T08:14:55Z) - Evaluating the state-of-the-art in mapping research spaces: a Brazilian
case study [0.0]
Two recent works propose methods for creating research maps from scientists' publication records.
We evaluate these models' ability to predict whether a given entity will enter a new field.
We conduct a case study to showcase how these models can be used to characterize science dynamics in the context of Brazil.
arXiv Detail & Related papers (2021-04-07T18:14:41Z) - Early Indicators of Scientific Impact: Predicting Citations with
Altmetrics [0.0]
We use altmetrics to predict the short-term and long-term citations that a scholarly publication could receive.
We build various classification and regression models and evaluate their performance, finding neural networks and ensemble models to perform best for these tasks.
arXiv Detail & Related papers (2020-12-25T16:25:07Z) - A Survey of Embedding Space Alignment Methods for Language and Knowledge
Graphs [77.34726150561087]
We survey the current research landscape on word, sentence and knowledge graph embedding algorithms.
We provide a classification of the relevant alignment techniques and discuss benchmark datasets used in this field of research.
arXiv Detail & Related papers (2020-10-26T16:08:13Z) - Biases in Data Science Lifecycle [0.0]
The aim of this study is to provide a practical guideline to data scientists and increase their awareness.
In this work, we reviewed different sources of biases and grouped them under different stages of the data science lifecycle.
arXiv Detail & Related papers (2020-09-10T13:41:48Z) - REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets [64.76453161039973]
REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset.
It surfacing potential biases along three dimensions: (1) object-based, (2) person-based, and (3) geography-based.
arXiv Detail & Related papers (2020-04-16T23:54:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.