Quantifying hierarchy in scientific teams
- URL: http://arxiv.org/abs/2210.05852v1
- Date: Wed, 12 Oct 2022 01:28:25 GMT
- Title: Quantifying hierarchy in scientific teams
- Authors: Fengli Xu, Lingfei Wu, James A. Evans
- Abstract summary: This paper provides a detailed description of the data collection and machine learning model used in our recent PNAS paper "Flat Teams Drive Scientific Innovation"
We discuss how the features of scientific publication can be used to estimate the implicit hierarchy in the corresponding author teams.
- Score: 42.444263246116485
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper provides a detailed description of the data collection and machine
learning model used in our recent PNAS paper "Flat Teams Drive Scientific
Innovation" Xu et al. [2022a]. Here, we discuss how the features of scientific
publication can be used to estimate the implicit hierarchy in the corresponding
author teams. Besides, we also describe the method of evaluating the impact of
team hierarchy on scientific outputs. More details will be updated in this
article continuously. Raw data and Readme document can be accessed in this
GitHub repository Xu et al. [2022b].
Related papers
- Overview of the SciHigh Track at FIRE 2025: Research Highlight Generation from Scientific Papers [7.474480823192324]
SciHigh: Research Highlight Generation from Scientific Papers' focuses on the task of automatically generating concise, informative, and meaningful bullet-point highlights.<n>The track uses the MixSub dataset cite10172215, which provides pairs of abstracts and corresponding author-written highlights.<n>All submissions were evaluated using established metrics such as ROUGE, METEOR, and BERTScore to measure both alignment with author-written highlights and overall informativeness.
arXiv Detail & Related papers (2026-01-01T16:25:16Z) - SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - The Open Review-Based (ORB) dataset: Towards Automatic Assessment of
Scientific Papers and Experiment Proposals in High-Energy Physics [0.0]
We introduce the new comprehensive Open Review-Based dataset (ORB)
It includes a curated list of more than 36,000 scientific papers with their more than 89,000 reviews and final decisions.
This paper presents our data architecture and an overview of the collected data along with relevant statistics.
arXiv Detail & Related papers (2023-11-29T20:52:02Z) - MuLMS-AZ: An Argumentative Zoning Dataset for the Materials Science
Domain [1.209268134212644]
Classifying the Argumentative Zone (AZ) has been proposed to improve processing of scholarly documents.
We present and release a new dataset of 50 manually annotated research articles.
arXiv Detail & Related papers (2023-07-05T14:55:18Z) - Contrastive Hierarchical Discourse Graph for Scientific Document
Summarization [14.930704950433324]
CHANGES is a contrastive hierarchical graph neural network for extractive scientific paper summarization.
We also propose a graph contrastive learning module to learn global theme-aware sentence representations.
arXiv Detail & Related papers (2023-05-31T20:54:43Z) - The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature.
We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction.
The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z) - Hierarchical Multi-Label Classification of Scientific Documents [47.293189105900524]
We introduce a new dataset for hierarchical multi-label text classification of scientific papers called SciHTC.
This dataset contains 186,160 papers and 1,233 categories from the ACM CCS tree.
Our best model achieves a Macro-F1 score of 34.57% which shows that this dataset provides significant research opportunities.
arXiv Detail & Related papers (2022-11-05T04:12:57Z) - Enhancing Scientific Papers Summarization with Citation Graph [78.65955304229863]
We redefine the task of scientific papers summarization by utilizing their citation graph.
We construct a novel scientific papers summarization dataset Semantic Scholar Network (SSN) which contains 141K research papers in different domains.
Our model can achieve competitive performance when compared with the pretrained models.
arXiv Detail & Related papers (2021-04-07T11:13:35Z) - Fact or Fiction: Verifying Scientific Claims [53.29101835904273]
We introduce scientific claim verification, a new task to select abstracts from the research literature containing evidence that SUPPORTS or REFUTES a given scientific claim.
We construct SciFact, a dataset of 1.4K expert-written scientific claims paired with evidence-containing abstracts annotated with labels and rationales.
We show that our system is able to verify claims related to COVID-19 by identifying evidence from the CORD-19 corpus.
arXiv Detail & Related papers (2020-04-30T17:22:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.