How Data Scientists Review the Scholarly Literature
- URL: http://arxiv.org/abs/2301.03774v1
- Date: Tue, 10 Jan 2023 03:53:05 GMT
- Title: How Data Scientists Review the Scholarly Literature
- Authors: Sheshera Mysore, Mahmood Jasim, Haoru Song, Sarah Akbar, Andre Kenneth
Chase Randall, Narges Mahyar
- Abstract summary: We examine the literature review practices of data scientists.
Data science represents a field seeing an exponential rise in papers.
No prior work has examined the specific practices and challenges faced by these scientists.
- Score: 4.406926847270567
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Keeping up with the research literature plays an important role in the
workflow of scientists - allowing them to understand a field, formulate the
problems they focus on, and develop the solutions that they contribute, which
in turn shape the nature of the discipline. In this paper, we examine the
literature review practices of data scientists. Data science represents a field
seeing an exponential rise in papers, and increasingly drawing on and being
applied in numerous diverse disciplines. Recent efforts have seen the
development of several tools intended to help data scientists cope with a
deluge of research and coordinated efforts to develop AI tools intended to
uncover the research frontier. Despite these trends indicative of the
information overload faced by data scientists, no prior work has examined the
specific practices and challenges faced by these scientists in an
interdisciplinary field with evolving scholarly norms. In this paper, we close
this gap through a set of semi-structured interviews and think-aloud protocols
of industry and academic data scientists (N = 20). Our results while
corroborating other knowledge workers' practices uncover several novel
findings: individuals (1) are challenged in seeking and sensemaking of papers
beyond their disciplinary bubbles, (2) struggle to understand papers in the
face of missing details and mathematical content, (3) grapple with the deluge
by leveraging the knowledge context in code, blogs, and talks, and (4) lean on
their peers online and in-person. Furthermore, we outline future directions
likely to help data scientists cope with the burgeoning research literature.
Related papers
- A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [68.48094108571432]
Large language models (LLMs) have revolutionized the way text and other modalities of data are handled.
We aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs.
arXiv Detail & Related papers (2024-06-16T08:03:24Z) - MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects.
MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years.
We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z) - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is a large language model-powered research idea writing agent.
It generates problems, methods, and experiment designs while iteratively refining them based on scientific literature.
We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z) - Knowledge Graphs for the Life Sciences: Recent Developments, Challenges
and Opportunities [11.35513523308132]
We discuss developments and advances in the use of graph-based technologies in life sciences.
We focus on three broad topics: the construction and management of Knowledge Graphs (KGs), the use of KGs and associated technologies in the discovery of new knowledge, and the use of KGs in artificial intelligence applications to support explanations.
arXiv Detail & Related papers (2023-09-29T14:03:34Z) - A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and
Why? [84.46288849132634]
We propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques.
We define three variables to encompass diverse facets of the evolution of research topics within NLP.
We utilize a causal discovery algorithm to unveil the causal connections among these variables using observational data.
arXiv Detail & Related papers (2023-05-22T11:08:00Z) - Modeling Information Change in Science Communication with Semantically
Matched Paraphrases [50.67030449927206]
SPICED is the first paraphrase dataset of scientific findings annotated for degree of information change.
SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers.
Models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims.
arXiv Detail & Related papers (2022-10-24T07:44:38Z) - A Search Engine for Discovery of Biomedical Challenges and Directions [38.72769142277108]
We construct and release an expert-annotated corpus of texts sampled from full-length papers.
We focus on a large corpus of interdisciplinary work relating to the COVID-19 pandemic.
We apply a model trained on our data to identify challenges and directions across the corpus and build a dedicated search engine for this information.
arXiv Detail & Related papers (2021-08-31T11:08:20Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - A field guide to cultivating computational biology [1.040598660564506]
Biomedical research centers can empower basic discovery and novel therapeutic strategies by leveraging their large-scale datasets from experiments and patients.
This data, together with new technologies to create and analyze it, has ushered in an era of data-driven discovery which requires moving beyond the traditional individual, single-discipline investigator research model.
We propose solutions for individual scientists, institutions, journal publishers, funding agencies, and educators.
arXiv Detail & Related papers (2021-04-23T01:24:21Z) - Semantic and Relational Spaces in Science of Science: Deep Learning
Models for Article Vectorisation [4.178929174617172]
We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs)
Our results show that using NLP we can encode a semantic space of articles, while with GNN we are able to build a relational space where the social practices of a research community are also encoded.
arXiv Detail & Related papers (2020-11-05T14:57:41Z) - Generating Knowledge Graphs by Employing Natural Language Processing and
Machine Learning Techniques within the Scholarly Domain [1.9004296236396943]
We present a new architecture that takes advantage of Natural Language Processing and Machine Learning methods for extracting entities and relationships from research publications.
Within this research work, we i) tackle the challenge of knowledge extraction by employing several state-of-the-art Natural Language Processing and Text Mining tools.
We generated a scientific knowledge graph including 109,105 triples, extracted from 26,827 abstracts of papers within the Semantic Web domain.
arXiv Detail & Related papers (2020-10-28T08:31:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.