Related papers: KnowledgeShovel: An AI-in-the-Loop Document Annotation System for Scientific Knowledge Base Construction

KnowledgeShovel: An AI-in-the-Loop Document Annotation System for Scientific Knowledge Base Construction

URL: http://arxiv.org/abs/2210.02830v1
Date: Thu, 6 Oct 2022 11:38:18 GMT
Title: KnowledgeShovel: An AI-in-the-Loop Document Annotation System for Scientific Knowledge Base Construction
Authors: Shao Zhang, Yuting Jia, Hui Xu, Dakuo Wang, Toby Jia-jun Li, Ying Wen, Xinbing Wang, Chenghu Zhou
Abstract summary: KnowledgeShovel is an Al-in-the-Loop document annotation system for researchers to construct scientific knowledge bases. The design of KnowledgeShovel introduces a multi-step multi-modalAI collaboration pipeline to improve data accuracy while reducing the human burden. A follow-up user evaluation with 7 geoscience researchers shows that KnowledgeShovel can enable efficient construction of scientific knowledge bases with satisfactory accuracy.
Score: 46.56643271476249
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Constructing a comprehensive, accurate, and useful scientific knowledge base is crucial for human researchers synthesizing scientific knowledge and for enabling Al-driven scientific discovery. However, the current process is difficult, error-prone, and laborious due to (1) the enormous amount of scientific literature available; (2) the highly-specialized scientific domains; (3) the diverse modalities of information (text, figure, table); and, (4) the silos of scientific knowledge in different publications with inconsistent formats and structures. Informed by a formative study and iterated with participatory design workshops, we designed and developed KnowledgeShovel, an Al-in-the-Loop document annotation system for researchers to construct scientific knowledge bases. The design of KnowledgeShovel introduces a multi-step multi-modal human-AI collaboration pipeline that aligns with users' existing workflows to improve data accuracy while reducing the human burden. A follow-up user evaluation with 7 geoscience researchers shows that KnowledgeShovel can enable efficient construction of scientific knowledge bases with satisfactory accuracy.

Related papers

SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence [99.30934038146965]
SciEvalKit focuses on the core competencies of scientific intelligence.<n>It supports six major scientific domains, spanning from physics and chemistry to astronomy and materials science.<n>The toolkit is open-sourced and actively maintained to foster community-driven development and progress in AI4Science.
arXiv Detail & Related papers (2025-12-26T17:36:02Z)
OmniScientist: Toward a Co-evolving Ecosystem of Human and AI Scientists [47.41269933143946]
We introduce OmniScientist, a framework that encodes the underlying mechanisms of human research into the AI scientific workflow.<n> OmniScientist achieves end-to-end automation across data foundation, literature review, research ideation, experiment automation, scientific writing, and peer review.<n>This infrastructure empowers agents to not only comprehend and leverage human knowledge systems but also to collaborate and co-evolve.
arXiv Detail & Related papers (2025-11-21T03:55:19Z)
Advancing Scientific Knowledge Retrieval and Reuse with a Novel Digital Library for Machine-Readable Knowledge [4.450387519903374]
ORKG reborn is an emerging digital library that supports finding, accessing, and reusing accurate, fine-grained, and reproducible machine-readable expressions of scientific knowledge.<n>We describe the proposed system and demonstrate its practical viability and potential for information retrieval in contrast to state-of-the-art digital libraries and document-centric scholarly communication.
arXiv Detail & Related papers (2025-11-11T17:20:02Z)
SciGPT: A Large Language Model for Scientific Literature Understanding and Knowledge Discovery [3.779883844533933]
This paper presents SciGPT, a domain-adapted model for scientific literature understanding and ScienceBench, an open source benchmark tailored to evaluate scientific LLMs.<n>Built on the Qwen3 architecture, SciGPT incorporates three key innovations: (1) low-cost domain distillation via a two-stage pipeline to balance performance and efficiency; (2) a Sparse Mixture-of-Experts attention mechanism that cuts memory consumption by 55% for 32,000 long-token reasoning; and (3) knowledge-aware adaptation integrating domain-specific nuances.<n> Experimental results on ScienceBench show that SciGPT outperforms GPT-4o in core scientific tasks including sequence
arXiv Detail & Related papers (2025-09-09T16:09:19Z)
ScienceMeter: Tracking Scientific Knowledge Updates in Language Models [79.33626657942169]
Large Language Models (LLMs) are increasingly used to support scientific research, but their knowledge of scientific advancements can quickly become outdated.<n>We introduce ScienceMeter, a new framework for evaluating scientific knowledge update methods over scientific knowledge spanning the past, present, and future.
arXiv Detail & Related papers (2025-05-30T07:28:20Z)
Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery [35.888956949646]
Large Language Models (LLMs) are transforming scientific research by reshaping the scientific method.<n>LLMs are involved in experimental design, data analysis, and enhancing productivity, particularly in chemistry and biology.<n>Transition to AI-driven science raises ethical questions about creativity, oversight, and responsibility.
arXiv Detail & Related papers (2025-05-22T10:05:48Z)
SciMantify -- A Hybrid Approach for the Evolving Semantification of Scientific Knowledge [0.4499833362998487]
We propose an evolution model of knowledge representation, inspired by the 5-star Linked Open Data (LOD) model.<n>We develop a hybrid approach, called SciMantify, to support its evolving semantification.<n>We implement the approach in the Open Research Knowledge Graph (ORKG), an established platform for improving the findability, accessibility, interoperability, and reusability of scientific knowledge.
arXiv Detail & Related papers (2025-04-14T07:57:55Z)
Scaling Laws in Scientific Discovery with AI and Robot Scientists [72.3420699173245]
An autonomous generalist scientist (AGS) concept combines agentic AI and embodied robotics to automate the entire research lifecycle. AGS aims to significantly reduce the time and resources needed for scientific discovery. As these autonomous systems become increasingly integrated into the research process, we hypothesize that scientific discovery might adhere to new scaling laws.
arXiv Detail & Related papers (2025-03-28T14:00:27Z)
Large Language Models: New Opportunities for Access to Science [0.0]
The uptake of Retrieval Augmented Generation-enhanced chat applications in the construction of the open science environment of the KM3NeT neutrino detectors serves as a focus point to explore and exemplify prospects for the wider application of Large Language Models for our science.
arXiv Detail & Related papers (2025-01-13T11:58:27Z)
Two Heads Are Better Than One: A Multi-Agent System Has the Potential to Improve Scientific Idea Generation [48.29699224989952]
VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas. We show that this multi-agent approach outperforms the state-of-the-art method in producing novel and impactful scientific ideas.
arXiv Detail & Related papers (2024-10-12T07:16:22Z)
Fine-tuning and Prompt Engineering with Cognitive Knowledge Graphs for Scholarly Knowledge Organization [0.14999444543328289]
This research focuses on effectively conveying structured scholarly knowledge by utilizing large language models (LLMs) LLMs categorize scholarly articles and describe their contributions in a structured and comparable manner. Our methodology involves harnessing LLM knowledge, and complementing it with domain expert-verified scholarly data sourced from a CKG.
arXiv Detail & Related papers (2024-09-10T11:31:02Z)
SciDMT: A Large-Scale Corpus for Detecting Scientific Mentions [52.35520385083425]
We present SciDMT, an enhanced and expanded corpus for scientific mention detection. The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated mention annotations in the format of in-text span, and 2) an evaluation set, which comprises 100 scientific articles manually annotated for evaluation purposes.
arXiv Detail & Related papers (2024-06-20T22:03:21Z)
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models [35.98892300665275]
We introduce the SciKnowEval benchmark, a framework that evaluates large language models (LLMs) across five progressive levels of scientific knowledge. These levels aim to assess the breadth and depth of scientific knowledge in LLMs, including memory, comprehension, reasoning, discernment, and application. We benchmark 26 advanced open-source and proprietary LLMs using zero-shot and few-shot prompting strategies.
arXiv Detail & Related papers (2024-06-13T13:27:52Z)
Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators [78.63553017938911]
Large language models (LLMs) outperform information retrieval techniques for downstream knowledge-intensive tasks. However, community concerns abound regarding the factuality and potential implications of using this uncensored knowledge. We introduce CONNER, designed to evaluate generated knowledge from six important perspectives.
arXiv Detail & Related papers (2023-10-11T08:22:37Z)
Modeling Information Change in Science Communication with Semantically Matched Paraphrases [50.67030449927206]
SPICED is the first paraphrase dataset of scientific findings annotated for degree of information change. SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers. Models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims.
arXiv Detail & Related papers (2022-10-24T07:44:38Z)
Retrieval of Scientific and Technological Resources for Experts and Scholars [20.89926457148302]
The scientific and technological resources of experts and scholars are mainly composed of basic attributes and scientific research achievements. Due to information asymmetry and other reasons, the scientific and technological resources of experts and scholars cannot be connected with the society in a timely manner. This paper sorts out the related research work in this field from four aspects: text relation extraction, text knowledge representation learning, text vector retrieval and visualization system.
arXiv Detail & Related papers (2022-04-13T02:32:09Z)
Integration of knowledge and data in machine learning [0.456877715768796]
Through knowledge embedding, barriers between knowledge and data can be broken, and machine learning models with physical common sense can be formed. Knowledge discovery takes advantage of machine learning to extract new knowledge from observations. This study not only summarizes and analyzes the existing literature, but also proposes research gaps and future opportunities.
arXiv Detail & Related papers (2022-02-15T10:35:53Z)
CitationIE: Leveraging the Citation Graph for Scientific Information Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers. We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.