SciTweets -- A Dataset and Annotation Framework for Detecting Scientific
Online Discourse
- URL: http://arxiv.org/abs/2206.07360v2
- Date: Wed, 6 Jul 2022 11:32:07 GMT
- Title: SciTweets -- A Dataset and Annotation Framework for Detecting Scientific
Online Discourse
- Authors: Salim Hafid, Sebastian Schellhammer, Sandra Bringay, Konstantin
Todorov, Stefan Dietze
- Abstract summary: Scientific topics, claims and resources are increasingly debated as part of online discourse.
This has led to both significant societal impact and increased interest in scientific online discourse from various disciplines.
Research across disciplines currently suffers from a lack of robust definitions of the various forms of science-relatedness.
- Score: 2.3371548697609303
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scientific topics, claims and resources are increasingly debated as part of
online discourse, where prominent examples include discourse related to
COVID-19 or climate change. This has led to both significant societal impact
and increased interest in scientific online discourse from various disciplines.
For instance, communication studies aim at a deeper understanding of biases,
quality or spreading pattern of scientific information whereas computational
methods have been proposed to extract, classify or verify scientific claims
using NLP and IR techniques. However, research across disciplines currently
suffers from both a lack of robust definitions of the various forms of
science-relatedness as well as appropriate ground truth data for distinguishing
them. In this work, we contribute (a) an annotation framework and corresponding
definitions for different forms of scientific relatedness of online discourse
in Tweets, (b) an expert-annotated dataset of 1261 tweets obtained through our
labeling framework reaching an average Fleiss Kappa $\kappa$ of 0.63, (c) a
multi-label classifier trained on our data able to detect science-relatedness
with 89% F1 and also able to detect distinct forms of scientific knowledge
(claims, references). With this work we aim to lay the foundation for
developing and evaluating robust methods for analysing science as part of
large-scale online discourse.
Related papers
- Detecting text level intellectual influence with knowledge graph embeddings [0.0]
We collect a corpus of open source journal articles and generate Knowledge Graph representations using the Gemini LLM.
We attempt to predict the existence of citations between sampled pairs of articles using previously published methods and a novel Graph Neural Network based embedding model.
arXiv Detail & Related papers (2024-10-31T15:21:27Z) - SciDMT: A Large-Scale Corpus for Detecting Scientific Mentions [52.35520385083425]
We present SciDMT, an enhanced and expanded corpus for scientific mention detection.
The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated mention annotations in the format of in-text span, and 2) an evaluation set, which comprises 100 scientific articles manually annotated for evaluation purposes.
arXiv Detail & Related papers (2024-06-20T22:03:21Z) - SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and
Why? [84.46288849132634]
We propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques.
We define three variables to encompass diverse facets of the evolution of research topics within NLP.
We utilize a causal discovery algorithm to unveil the causal connections among these variables using observational data.
arXiv Detail & Related papers (2023-05-22T11:08:00Z) - How Data Scientists Review the Scholarly Literature [4.406926847270567]
We examine the literature review practices of data scientists.
Data science represents a field seeing an exponential rise in papers.
No prior work has examined the specific practices and challenges faced by these scientists.
arXiv Detail & Related papers (2023-01-10T03:53:05Z) - Modeling Information Change in Science Communication with Semantically
Matched Paraphrases [50.67030449927206]
SPICED is the first paraphrase dataset of scientific findings annotated for degree of information change.
SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers.
Models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims.
arXiv Detail & Related papers (2022-10-24T07:44:38Z) - SciLander: Mapping the Scientific News Landscape [8.504643390943409]
We introduce SciLander, a method for learning representations of news sources reporting on science-based topics.
We evaluate our method on a novel COVID-19 dataset containing nearly 1M news articles from 500 sources spanning a period of 18 months since the beginning of the pandemic in 2020.
arXiv Detail & Related papers (2022-05-16T20:20:43Z) - Measuring Disagreement in Science [0.0]
We use a cue-phrase based approach to identify instances of disagreement citations across more than four million scientific articles.
We reveal a disciplinary spectrum of disagreement, with higher disagreement in the social sciences and lower disagreement in physics and mathematics.
arXiv Detail & Related papers (2021-07-30T14:07:34Z) - Semantic Analysis for Automated Evaluation of the Potential Impact of
Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory.
We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus.
We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z) - Extracting a Knowledge Base of Mechanisms from COVID-19 Papers [50.17242035034729]
We pursue the construction of a knowledge base (KB) of mechanisms.
We develop a broad, unified schema that strikes a balance between relevance and breadth.
Experiments demonstrate the utility of our KB in supporting interdisciplinary scientific search over COVID-19 literature.
arXiv Detail & Related papers (2020-10-08T07:54:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.