Related papers: Measuring and Analyzing Subjective Uncertainty in Scientific Communications

Measuring and Analyzing Subjective Uncertainty in Scientific Communications

URL: http://arxiv.org/abs/2503.21114v1
Date: Thu, 27 Mar 2025 03:12:50 GMT
Title: Measuring and Analyzing Subjective Uncertainty in Scientific Communications
Authors: Jamshid Sourati, Grace Shao,
Abstract summary: This work measured/analyzed the subjective uncertainty and its impact within scientific communities across different disciplines.<n>We showed that the level of this type of uncertainty varies significantly across different fields, years of publication and geographical locations.<n>We also studied the correlation between subjective uncertainty and several metrics, such as number/gender of authors, centrality of the field's community, citation count, etc.
Score: 1.3154296174423619
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Uncertainty of scientific findings are typically reported through statistical metrics such as $p$-values, confidence intervals, etc. The magnitude of this objective uncertainty is reflected in the language used by the authors to report their findings primarily through expressions carrying uncertainty-inducing terms or phrases. This language uncertainty is a subjective concept and is highly dependent on the writing style of the authors. There is evidence that such subjective uncertainty influences the impact of science on public audience. In this work, we turned our focus to scientists themselves, and measured/analyzed the subjective uncertainty and its impact within scientific communities across different disciplines. We showed that the level of this type of uncertainty varies significantly across different fields, years of publication and geographical locations. We also studied the correlation between subjective uncertainty and several bibliographical metrics, such as number/gender of authors, centrality of the field's community, citation count, etc. The underlying patterns identified in this work are useful in identification and documentation of linguistic norms in scientific communication in different communities/societies.

Related papers

Disparities in Peer Review Tone and the Role of Reviewer Anonymity [0.0]
This study examines more than 80,000 reviews in two major journals.<n>It uncovers how review tone, sentiment, and supportive language vary across author demographics.
arXiv Detail & Related papers (2025-07-19T20:19:21Z)
Anthropomimetic Uncertainty: What Verbalized Uncertainty in Language Models is Missing [66.04926909181653]
We argue for anthropomimetic uncertainty, meaning that intuitive and trustworthy uncertainty communication requires a degree of linguistic authenticity and personalization to the user.<n>We conclude by pointing out unique factors in human-machine communication of uncertainty and deconstruct the data biases that influence machine uncertainty communication.
arXiv Detail & Related papers (2025-07-11T14:07:22Z)
Large Language Models Often Say One Thing and Do Another [49.22262396351797]
We develop a novel evaluation benchmark called the Words and Deeds Consistency Test (WDCT)<n>The benchmark establishes a strict correspondence between word-based and deed-based questions across different domains.<n>The evaluation results reveal a widespread inconsistency between words and deeds across different LLMs and domains.
arXiv Detail & Related papers (2025-03-10T07:34:54Z)
Causal Language in Observational Studies: Sociocultural Backgrounds and Team Composition [10.71018453873532]
We show that causal language is more common in work by less experienced authors, smaller research teams, male last authors, and researchers from countries with higher uncertainty avoidance indices.<n>Our findings suggest that the use of causal language is not solely driven by the strength of evidence, but also by the sociocultural backgrounds of authors and their team composition.
arXiv Detail & Related papers (2025-02-04T02:00:10Z)
Understanding Fine-grained Distortions in Reports of Scientific Findings [46.96512578511154]
Distorted science communication harms individuals and society as it can lead to unhealthy behavior change and decrease trust in scientific institutions. Given the rapidly increasing volume of science communication in recent years, a fine-grained understanding of how findings from scientific publications are reported to the general public is crucial.
arXiv Detail & Related papers (2024-02-19T19:00:01Z)
Semantic Properties of cosine based bias scores for word embeddings [48.0753688775574]
We propose requirements for bias scores to be considered meaningful for quantifying biases. We analyze cosine based scores from the literature with regard to these requirements. We underline these findings with experiments to show that the bias scores' limitations have an impact in the application case.
arXiv Detail & Related papers (2024-01-27T20:31:10Z)
A Content-Based Novelty Measure for Scholarly Publications: A Proof of Concept [9.148691357200216]
We introduce an information-theoretic measure of novelty in scholarly publications. This measure quantifies the degree of'surprise' perceived by a language model that represents the word distribution of scholarly discourse.
arXiv Detail & Related papers (2024-01-08T03:14:24Z)
Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z)
False perspectives on human language: why statistics needs linguistics [0.8699677835130408]
We show that statistical measures can be defined on the basis of either structural or non-structural models. Only models of surprisal that reflect syntactic structure are able to account for language regularities.
arXiv Detail & Related papers (2023-02-17T11:40:32Z)
An Informational Space Based Semantic Analysis for Scientific Texts [62.997667081978825]
This paper introduces computational methods for semantic analysis and the quantifying the meaning of short scientific texts. The representation of scientific-specific meaning is standardised by replacing the situation representations, rather than psychological properties. The research in this paper conducts the base for the geometric representation of the meaning of texts.
arXiv Detail & Related papers (2022-05-31T11:19:32Z)
Measuring Sentence-Level and Aspect-Level (Un)certainty in Science Communications [9.36599317326032]
We introduce a new study of certainty that models both the level and the aspects of certainty in scientific findings. We show that both the overall certainty and individual aspects can be predicted with pre-trained language models.
arXiv Detail & Related papers (2021-09-30T00:50:51Z)
Annotation Uncertainty in the Context of Grammatical Change [0.05249805590164901]
This paper elaborates on the notion of uncertainty in the context of annotation in large text corpora. By examining annotation uncertainty in more detail, we identify the sources and deepen our understanding of the nature and different types of uncertainty encountered in daily annotation practice. This article can be seen as an attempt to reconcile the perspectives of the main scientific disciplines involved in corpus projects, linguistics and computer science.
arXiv Detail & Related papers (2021-05-15T17:45:29Z)
Semantic Analysis for Automated Evaluation of the Potential Impact of Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory. We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus. We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z)
Embracing Uncertainty: Decoupling and De-bias for Robust Temporal Grounding [23.571580627202405]
Temporal grounding aims to localize temporal boundaries within untrimmed videos by language queries. It faces the challenge of two types of inevitable human uncertainties: query uncertainty and label uncertainty. We propose a novel DeNet (Decoupling and De-bias) to embrace human uncertainty.
arXiv Detail & Related papers (2021-03-31T07:00:56Z)
Where New Words Are Born: Distributional Semantic Analysis of Neologisms and Their Semantic Neighborhoods [51.34667808471513]
We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm. We show that both factors are predictive word emergence although we find more support for the latter hypothesis.
arXiv Detail & Related papers (2020-01-21T19:09:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.