KnowledgeShovel: An AI-in-the-Loop Document Annotation System for
Scientific Knowledge Base Construction
- URL: http://arxiv.org/abs/2210.02830v1
- Date: Thu, 6 Oct 2022 11:38:18 GMT
- Title: KnowledgeShovel: An AI-in-the-Loop Document Annotation System for
Scientific Knowledge Base Construction
- Authors: Shao Zhang, Yuting Jia, Hui Xu, Dakuo Wang, Toby Jia-jun Li, Ying Wen,
Xinbing Wang, Chenghu Zhou
- Abstract summary: KnowledgeShovel is an Al-in-the-Loop document annotation system for researchers to construct scientific knowledge bases.
The design of KnowledgeShovel introduces a multi-step multi-modalAI collaboration pipeline to improve data accuracy while reducing the human burden.
A follow-up user evaluation with 7 geoscience researchers shows that KnowledgeShovel can enable efficient construction of scientific knowledge bases with satisfactory accuracy.
- Score: 46.56643271476249
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Constructing a comprehensive, accurate, and useful scientific knowledge base
is crucial for human researchers synthesizing scientific knowledge and for
enabling Al-driven scientific discovery. However, the current process is
difficult, error-prone, and laborious due to (1) the enormous amount of
scientific literature available; (2) the highly-specialized scientific domains;
(3) the diverse modalities of information (text, figure, table); and, (4) the
silos of scientific knowledge in different publications with inconsistent
formats and structures. Informed by a formative study and iterated with
participatory design workshops, we designed and developed KnowledgeShovel, an
Al-in-the-Loop document annotation system for researchers to construct
scientific knowledge bases. The design of KnowledgeShovel introduces a
multi-step multi-modal human-AI collaboration pipeline that aligns with users'
existing workflows to improve data accuracy while reducing the human burden. A
follow-up user evaluation with 7 geoscience researchers shows that
KnowledgeShovel can enable efficient construction of scientific knowledge bases
with satisfactory accuracy.
Related papers
- Two Heads Are Better Than One: A Multi-Agent System Has the Potential to Improve Scientific Idea Generation [48.29699224989952]
VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas.
We show that this multi-agent approach outperforms the state-of-the-art method in producing novel and impactful scientific ideas.
arXiv Detail & Related papers (2024-10-12T07:16:22Z) - Fine-tuning and Prompt Engineering with Cognitive Knowledge Graphs for Scholarly Knowledge Organization [0.14999444543328289]
This research focuses on effectively conveying structured scholarly knowledge by utilizing large language models (LLMs)
LLMs categorize scholarly articles and describe their contributions in a structured and comparable manner.
Our methodology involves harnessing LLM knowledge, and complementing it with domain expert-verified scholarly data sourced from a CKG.
arXiv Detail & Related papers (2024-09-10T11:31:02Z) - SciDMT: A Large-Scale Corpus for Detecting Scientific Mentions [52.35520385083425]
We present SciDMT, an enhanced and expanded corpus for scientific mention detection.
The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated mention annotations in the format of in-text span, and 2) an evaluation set, which comprises 100 scientific articles manually annotated for evaluation purposes.
arXiv Detail & Related papers (2024-06-20T22:03:21Z) - SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models [35.98892300665275]
We introduce the SciKnowEval benchmark, a framework that evaluates large language models (LLMs) across five progressive levels of scientific knowledge.
These levels aim to assess the breadth and depth of scientific knowledge in LLMs, including memory, comprehension, reasoning, discernment, and application.
We benchmark 26 advanced open-source and proprietary LLMs using zero-shot and few-shot prompting strategies.
arXiv Detail & Related papers (2024-06-13T13:27:52Z) - Beyond Factuality: A Comprehensive Evaluation of Large Language Models
as Knowledge Generators [78.63553017938911]
Large language models (LLMs) outperform information retrieval techniques for downstream knowledge-intensive tasks.
However, community concerns abound regarding the factuality and potential implications of using this uncensored knowledge.
We introduce CONNER, designed to evaluate generated knowledge from six important perspectives.
arXiv Detail & Related papers (2023-10-11T08:22:37Z) - Modeling Information Change in Science Communication with Semantically
Matched Paraphrases [50.67030449927206]
SPICED is the first paraphrase dataset of scientific findings annotated for degree of information change.
SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers.
Models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims.
arXiv Detail & Related papers (2022-10-24T07:44:38Z) - Retrieval of Scientific and Technological Resources for Experts and
Scholars [20.89926457148302]
The scientific and technological resources of experts and scholars are mainly composed of basic attributes and scientific research achievements.
Due to information asymmetry and other reasons, the scientific and technological resources of experts and scholars cannot be connected with the society in a timely manner.
This paper sorts out the related research work in this field from four aspects: text relation extraction, text knowledge representation learning, text vector retrieval and visualization system.
arXiv Detail & Related papers (2022-04-13T02:32:09Z) - Integration of knowledge and data in machine learning [0.456877715768796]
Through knowledge embedding, barriers between knowledge and data can be broken, and machine learning models with physical common sense can be formed.
Knowledge discovery takes advantage of machine learning to extract new knowledge from observations.
This study not only summarizes and analyzes the existing literature, but also proposes research gaps and future opportunities.
arXiv Detail & Related papers (2022-02-15T10:35:53Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.