KnowledgeShovel: An AI-in-the-Loop Document Annotation System for
Scientific Knowledge Base Construction
- URL: http://arxiv.org/abs/2210.02830v1
- Date: Thu, 6 Oct 2022 11:38:18 GMT
- Title: KnowledgeShovel: An AI-in-the-Loop Document Annotation System for
Scientific Knowledge Base Construction
- Authors: Shao Zhang, Yuting Jia, Hui Xu, Dakuo Wang, Toby Jia-jun Li, Ying Wen,
Xinbing Wang, Chenghu Zhou
- Abstract summary: KnowledgeShovel is an Al-in-the-Loop document annotation system for researchers to construct scientific knowledge bases.
The design of KnowledgeShovel introduces a multi-step multi-modalAI collaboration pipeline to improve data accuracy while reducing the human burden.
A follow-up user evaluation with 7 geoscience researchers shows that KnowledgeShovel can enable efficient construction of scientific knowledge bases with satisfactory accuracy.
- Score: 46.56643271476249
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Constructing a comprehensive, accurate, and useful scientific knowledge base
is crucial for human researchers synthesizing scientific knowledge and for
enabling Al-driven scientific discovery. However, the current process is
difficult, error-prone, and laborious due to (1) the enormous amount of
scientific literature available; (2) the highly-specialized scientific domains;
(3) the diverse modalities of information (text, figure, table); and, (4) the
silos of scientific knowledge in different publications with inconsistent
formats and structures. Informed by a formative study and iterated with
participatory design workshops, we designed and developed KnowledgeShovel, an
Al-in-the-Loop document annotation system for researchers to construct
scientific knowledge bases. The design of KnowledgeShovel introduces a
multi-step multi-modal human-AI collaboration pipeline that aligns with users'
existing workflows to improve data accuracy while reducing the human burden. A
follow-up user evaluation with 7 geoscience researchers shows that
KnowledgeShovel can enable efficient construction of scientific knowledge bases
with satisfactory accuracy.
Related papers
- SciDMT: A Large-Scale Corpus for Detecting Scientific Mentions [52.35520385083425]
We present SciDMT, an enhanced and expanded corpus for scientific mention detection.
The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated mention annotations in the format of in-text span, and 2) an evaluation set, which comprises 100 scientific articles manually annotated for evaluation purposes.
arXiv Detail & Related papers (2024-06-20T22:03:21Z) - SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models [35.98892300665275]
SciKnowEval is a framework that evaluates Large Language Models (LLMs) across five progressive levels of scientific knowledge.
We benchmark 20 leading open-source and proprietary LLMs using zero-shot and few-shot prompting strategies.
The results reveal that despite achieving state-of-the-art performance, the proprietary LLMs still have considerable room for improvement.
arXiv Detail & Related papers (2024-06-13T13:27:52Z) - Beyond Factuality: A Comprehensive Evaluation of Large Language Models
as Knowledge Generators [78.63553017938911]
Large language models (LLMs) outperform information retrieval techniques for downstream knowledge-intensive tasks.
However, community concerns abound regarding the factuality and potential implications of using this uncensored knowledge.
We introduce CONNER, designed to evaluate generated knowledge from six important perspectives.
arXiv Detail & Related papers (2023-10-11T08:22:37Z) - CLAIMED -- the open source framework for building coarse-grained
operators for accelerated discovery in science [0.0]
CLAIMED is a framework to build reusable operators and scalable scientific agnostic by supporting the scientist to draw from previous work by re-composing scientific operators.
CLAIMED is programming language, scientific library, and execution environment.
arXiv Detail & Related papers (2023-07-12T11:54:39Z) - Modeling Information Change in Science Communication with Semantically
Matched Paraphrases [50.67030449927206]
SPICED is the first paraphrase dataset of scientific findings annotated for degree of information change.
SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers.
Models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims.
arXiv Detail & Related papers (2022-10-24T07:44:38Z) - Retrieval of Scientific and Technological Resources for Experts and
Scholars [20.89926457148302]
The scientific and technological resources of experts and scholars are mainly composed of basic attributes and scientific research achievements.
Due to information asymmetry and other reasons, the scientific and technological resources of experts and scholars cannot be connected with the society in a timely manner.
This paper sorts out the related research work in this field from four aspects: text relation extraction, text knowledge representation learning, text vector retrieval and visualization system.
arXiv Detail & Related papers (2022-04-13T02:32:09Z) - DeepShovel: An Online Collaborative Platform for Data Extraction in
Geoscience Literature with AI Assistance [48.55345030503826]
Geoscientists need to read a huge amount of literature to locate, extract, and aggregate relevant results and data.
DeepShovel is a publicly-available AI-assisted data extraction system to support their needs.
A follow-up user evaluation with 14 researchers suggested DeepShovel improved users' efficiency of data extraction for building scientific databases.
arXiv Detail & Related papers (2022-02-21T12:18:08Z) - Integration of knowledge and data in machine learning [0.456877715768796]
Through knowledge embedding, barriers between knowledge and data can be broken, and machine learning models with physical common sense can be formed.
Knowledge discovery takes advantage of machine learning to extract new knowledge from observations.
This study not only summarizes and analyzes the existing literature, but also proposes research gaps and future opportunities.
arXiv Detail & Related papers (2022-02-15T10:35:53Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - A user-centered approach to designing an experimental laboratory data
platform [0.0]
We take a user-centered approach to understand what essential elements of design and functionality researchers want in an experimental data platform.
We find that having the capability to contextualize rich, complex experimental datasets is the primary user requirement.
arXiv Detail & Related papers (2020-07-28T19:26:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.