OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System
- URL: http://arxiv.org/abs/2412.20005v2
- Date: Thu, 06 Feb 2025 10:37:17 GMT
- Title: OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System
- Authors: Yujie Luo, Xiangyuan Ru, Kangwei Liu, Lin Yuan, Mengshu Sun, Ningyu Zhang, Lei Liang, Zhiqiang Zhang, Jun Zhou, Lanning Wei, Da Zheng, Haofen Wang, Huajun Chen,
- Abstract summary: OneKE is a dockerized schema-guided knowledge extraction system.<n>It can extract knowledge from the Web and raw PDF Books.<n>It supports various domains (science, news, etc.)
- Score: 41.0804067287909
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce OneKE, a dockerized schema-guided knowledge extraction system, which can extract knowledge from the Web and raw PDF Books, and support various domains (science, news, etc.). Specifically, we design OneKE with multiple agents and a configure knowledge base. Different agents perform their respective roles, enabling support for various extraction scenarios. The configure knowledge base facilitates schema configuration, error case debugging and correction, further improving the performance. Empirical evaluations on benchmark datasets demonstrate OneKE's efficacy, while case studies further elucidate its adaptability to diverse tasks across multiple domains, highlighting its potential for broad applications. We have open-sourced the Code at https://github.com/zjunlp/OneKE and released a Video at http://oneke.openkg.cn/demo.mp4.
Related papers
- UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents [65.14244917622881]
Recent Large Multimodal Models have shown promising potential for performing end-to-end KIE directly from document images.<n>We introduce UNIKIE-BENCH, a benchmark designed to rigorously evaluate the KIE capabilities of LMMs.<n>Experiments on 15 state-of-the-art LMMs reveal substantial performance degradation under diverse schema definitions, long-tail key fields, and complex layouts.
arXiv Detail & Related papers (2026-02-03T12:04:56Z) - KnowledgeSmith: Uncovering Knowledge Updating in LLMs with Model Editing and Unlearning [23.5611669268224]
Knowledge editing and machine unlearning are popular approaches for large language models (LLMs) to stay up-to-date.<n>This paper proposes KnowledgeSmith, a unified framework to systematically understand the updating mechanism of LLMs.
arXiv Detail & Related papers (2025-10-01T00:15:25Z) - DeepSieve: Information Sieving via LLM-as-a-Knowledge-Router [57.28685457991806]
DeepSieve is an agentic RAG framework that incorporates information sieving via LLM-as-a-knowledge-router.<n>Our design emphasizes modularity, transparency, and adaptability, leveraging recent advances in agentic system design.
arXiv Detail & Related papers (2025-07-29T17:55:23Z) - Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction [80.88654868264645]
Arranged and Organized Extraction Benchmark designed to evaluate ability of large language models to comprehend fragmented documents.<n>AOE includes 11 carefully crafted tasks across three diverse domains, requiring models to generate context-specific schema tailored to varied input queries.<n>Results show that even the most advanced models struggled significantly.
arXiv Detail & Related papers (2025-07-22T06:37:51Z) - LiPost: Improved Content Understanding With Effective Use of Multi-task Contrastive Learning [2.611731148829789]
We fine-tune a pre-trained, transformer-based LLM using multi-task contrastive learning with data from a diverse set of semantic labeling tasks.
Our model outperforms the baseline on zero shot learning and offers improved multilingual support.
This work provides a robust foundation for vertical teams across LinkedIn to customize and fine-tune the LLM to their specific applications.
arXiv Detail & Related papers (2024-05-18T17:28:29Z) - Generic Multi-modal Representation Learning for Network Traffic Analysis [6.372999570085887]
Network traffic analysis is fundamental for network management, troubleshooting, and security.
We propose a flexible Multi-modal Autoencoder (MAE) pipeline that can solve different use cases.
We argue that the MAE architecture is generic and can be used to learn representations useful in multiple scenarios.
arXiv Detail & Related papers (2024-05-04T12:24:29Z) - KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction [59.039355258637315]
We propose KnowCoder, a Large Language Model (LLM) to conduct Universal Information Extraction (UIE) via code generation.
KnowCoder introduces a code-style schema representation method to uniformly transform different schemas into Python classes.
KnowCoder contains a two-phase learning framework that enhances its schema understanding ability via code pretraining and its schema following ability via instruction tuning.
arXiv Detail & Related papers (2024-03-12T14:56:34Z) - DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent) [73.10899129264375]
This paper explores DoraemonGPT, a comprehensive and conceptually elegant system driven by LLMs to understand dynamic scenes.
Given a video with a question/task, DoraemonGPT begins by converting the input video into a symbolic memory that stores task-related attributes.
We extensively evaluate DoraemonGPT's effectiveness on three benchmarks and several in-the-wild scenarios.
arXiv Detail & Related papers (2024-01-16T14:33:09Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - Deep learning for table detection and structure recognition: A survey [49.09628624903334]
The goal of this survey is to provide a profound comprehension of the major developments in the field of Table Detection.
We provide an analysis of both classic and new applications in the field.
The datasets and source code of the existing models are organized to provide the reader with a compass on this vast literature.
arXiv Detail & Related papers (2022-11-15T19:42:27Z) - DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge
Base Population [95.0099875111663]
DeepKE implements various information extraction tasks, including named entity recognition, relation extraction and attribute extraction.
DeepKE allows developers and researchers to customize datasets and models to extract information from unstructured data according to their requirements.
arXiv Detail & Related papers (2022-01-10T13:29:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.