Annotating Scientific Uncertainty: A comprehensive model using linguistic patterns and comparison with existing approaches
- URL: http://arxiv.org/abs/2503.11376v1
- Date: Fri, 14 Mar 2025 13:21:59 GMT
- Title: Annotating Scientific Uncertainty: A comprehensive model using linguistic patterns and comparison with existing approaches
- Authors: Panggih Kusuma Ningrum, Philipp Mayr, Nina Smirnova, Iana Atanassova,
- Abstract summary: UnScientify is a system designed to detect scientific uncertainty in scholarly full text.<n>The core methodology of UnScientify is based on a multi-faceted pipeline that integrates span pattern matching, complex sentence analysis and author reference checking.<n>The evaluation results highlight the trade-offs between modern large language models (LLMs) and the UnScientify system.
- Score: 1.9627519910539217
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: UnScientify, a system designed to detect scientific uncertainty in scholarly full text. The system utilizes a weakly supervised technique to identify verbally expressed uncertainty in scientific texts and their authorial references. The core methodology of UnScientify is based on a multi-faceted pipeline that integrates span pattern matching, complex sentence analysis and author reference checking. This approach streamlines the labeling and annotation processes essential for identifying scientific uncertainty, covering a variety of uncertainty expression types to support diverse applications including information retrieval, text mining and scientific document processing. The evaluation results highlight the trade-offs between modern large language models (LLMs) and the UnScientify system. UnScientify, which employs more traditional techniques, achieved superior performance in the scientific uncertainty detection task, attaining an accuracy score of 0.808. This finding underscores the continued relevance and efficiency of UnScientify's simple rule-based and pattern matching strategy for this specific application. The results demonstrate that in scenarios where resource efficiency, interpretability, and domain-specific adaptability are critical, traditional methods can still offer significant advantages.
Related papers
- Detecting Statements in Text: A Domain-Agnostic Few-Shot Solution [1.3654846342364308]
State-of-the-art approaches usually involve fine-tuning models on large annotated datasets, which are costly to produce.
We propose and release a qualitative and versatile few-shot learning methodology as a common paradigm for any claim-based textual classification task.
We illustrate this methodology in the context of three tasks: climate change contrarianism detection, topic/stance classification and depression-relates symptoms detection.
arXiv Detail & Related papers (2024-05-09T12:03:38Z) - Towards Controlled Table-to-Text Generation with Scientific Reasoning [46.87189607486007]
We present a new task for generating fluent and logical descriptions that match user preferences over scientific data, aiming to automate scientific document analysis.
We construct a new challenging dataset,SciTab, consisting of table-description pairs extracted from the scientific literature, with highlighted cells and corresponding domain-specific knowledge base.
The results showed that large models struggle to produce accurate content that aligns with user preferences. As the first of its kind, our work should motivate further research in scientific domains.
arXiv Detail & Related papers (2023-12-08T22:57:35Z) - Testing the Consistency of Performance Scores Reported for Binary
Classification Problems [0.0]
We introduce numerical techniques to assess the consistency of reported performance scores and the assumed experimental setup.
We demonstrate how the proposed techniques can effectively detect inconsistencies, thereby safeguarding the integrity of research fields.
To benefit the scientific community, we have made the consistency tests available in an open-source Python package.
arXiv Detail & Related papers (2023-10-19T07:04:29Z) - UnScientify: Detecting Scientific Uncertainty in Scholarly Full Text [5.318135784473086]
UnScientify is an interactive system designed to detect scientific uncertainty in scholarly full text.
The pipeline for the system includes a combination of pattern matching, complex sentence checking, and authorial reference checking.
UnScientify provides interpretable results, aiding in the comprehension of identified instances of scientific uncertainty in text.
arXiv Detail & Related papers (2023-07-26T15:04:24Z) - Cognitive Semantic Communication Systems Driven by Knowledge Graph:
Principle, Implementation, and Performance Evaluation [74.38561925376996]
Two cognitive semantic communication frameworks are proposed for the single-user and multiple-user communication scenarios.
An effective semantic correction algorithm is proposed by mining the inference rule from the knowledge graph.
For the multi-user cognitive semantic communication system, a message recovery algorithm is proposed to distinguish messages of different users.
arXiv Detail & Related papers (2023-03-15T12:01:43Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - Detecting Text Formality: A Study of Text Classification Approaches [78.11745751651708]
This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods.
We conducted three types of experiments -- monolingual, multilingual, and cross-lingual.
The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task.
arXiv Detail & Related papers (2022-04-19T16:23:07Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.
We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - Expressing High-Level Scientific Claims with Formal Semantics [0.8258451067861932]
We analyze the main claims from a sample of scientific articles from all disciplines.
We find that their semantics are more complex than what a straight-forward application of formalisms like RDF or OWL account for.
We show here how the instantiation of the five slots of this super-pattern leads to a strictly defined statement in higher-order logic.
arXiv Detail & Related papers (2021-09-27T09:52:49Z) - Semantic Analysis for Automated Evaluation of the Potential Impact of
Research Articles [62.997667081978825]
This paper presents a novel method for vector representation of text meaning based on information theory.
We show how this informational semantics is used for text classification on the basis of the Leicester Scientific Corpus.
We show that an informational approach to representing the meaning of a text has offered a way to effectively predict the scientific impact of research papers.
arXiv Detail & Related papers (2021-04-26T20:37:13Z) - Improving Scholarly Knowledge Representation: Evaluating BERT-based
Models for Scientific Relation Classification [5.8962650619804755]
We show that domain-specific pre-training corpus benefits the Bert-based classification model to identify type of scientific relations.
Although the strategy of predicting a single relation each time achieves a higher classification accuracy, the latter strategy demonstrates a more consistent performance in the corpus with either a large or small size of annotations.
arXiv Detail & Related papers (2020-04-13T18:46:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.