Related papers: LLM-IE: A Python Package for Generative Information Extraction with Large Language Models

LLM-IE: A Python Package for Generative Information Extraction with Large Language Models

URL: http://arxiv.org/abs/2411.11779v1
Date: Mon, 18 Nov 2024 17:56:13 GMT
Title: LLM-IE: A Python Package for Generative Information Extraction with Large Language Models
Authors: Enshuo Hsu, Kirk Roberts,
Abstract summary: LLM-IE is a Python package for building complete information extraction pipelines. Key innovation is an interactive LLM agent to support schema definition and prompt design. System evaluation provided intuitive visualization.
Score: 3.7906296809297406
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Objectives: Despite the recent adoption of large language models (LLMs) for biomedical information extraction, challenges in prompt engineering and algorithms persist, with no dedicated software available. To address this, we developed LLM-IE: a Python package for building complete information extraction pipelines. Our key innovation is an interactive LLM agent to support schema definition and prompt design. Materials and Methods: The LLM-IE supports named entity recognition, entity attribute extraction, and relation extraction tasks. We benchmarked on the i2b2 datasets and conducted a system evaluation. Results: The sentence-based prompting algorithm resulted in the best performance while requiring a longer inference time. System evaluation provided intuitive visualization. Discussion: LLM-IE was designed from practical NLP experience in healthcare and has been adopted in internal projects. It should hold great value to the biomedical NLP community. Conclusion: We developed a Python package, LLM-IE, that provides building blocks for robust information extraction pipeline construction.

Related papers

DELM: a Python toolkit for Data Extraction with Language Models [0.0]
DELM (Data Extraction with Language Models) is an open-source Python toolkit designed for rapid experimental iteration of data extraction pipelines.<n>It minimizes boilerplate code and offers a modular framework with structured outputs, built-in validation, flexible data-loading and scoring strategies, and efficient batch processing.<n>It also includes robust support for working with LLM APIs, featuring retry logic, result caching, detailed cost tracking, and comprehensive configuration management.
arXiv Detail & Related papers (2025-09-24T23:47:55Z)
A Study of Large Language Models for Patient Information Extraction: Model Architecture, Fine-Tuning Strategy, and Multi-task Instruction Tuning [10.007149687726773]
The rapid development of large language models (LLMs) has revolutionized many NLP tasks in the clinical domain.<n>This study focuses on LLM architectures, fine-tuning strategies, and multi-task instruction tuning techniques for developing robust and generalizable patient information extraction systems.
arXiv Detail & Related papers (2025-09-05T02:07:40Z)
An Evaluation of Large Language Models on Text Summarization Tasks Using Prompt Engineering Techniques [0.0]
Large Language Models (LLMs) continue to advance natural language processing with their ability to generate human-like text.<n>We present a systematic evaluation of six LLMs across four datasets: CNN/Daily Mail and NewsRoom (news), SAMSum (dialog), and ArXiv (scientific)<n>Our study evaluates the performance using the ROUGE and BERTScore metrics.<n>For Long documents, introduce a sentence-based chunking strategy that enables LLMs with shorter context windows to summarize extended inputs in multiple stages.
arXiv Detail & Related papers (2025-07-07T15:34:05Z)
LLM Enhancer: Merged Approach using Vector Embedding for Reducing Large Language Model Hallucinations with External Knowledge [0.0]
Large Language Models (LLMs) have demonstrated the capability to generate human like, natural responses across a range of tasks. This paper introduces the LLM ENHANCER system, designed to integrate multiple online sources such as Google, Wikipedia, and DuckDuckGo to enhance data accuracy.
arXiv Detail & Related papers (2025-04-29T19:27:04Z)
LLM-Forest: Ensemble Learning of LLMs with Graph-Augmented Prompts for Data Imputation [50.375567142250446]
Large language models (LLMs), trained on vast corpora, have shown strong potential in data generation.<n>We propose a novel framework, LLM-Forest, which introduces a "forest" of few-shot prompt learning LLM "trees" with their outputs aggregated via confidence-based weighted voting.<n>This framework is established on a new concept of bipartite information graphs to identify high-quality relevant neighboring entries with both feature and value granularity.
arXiv Detail & Related papers (2024-10-28T20:42:46Z)
SwiftDossier: Tailored Automatic Dossier for Drug Discovery with LLMs and Agents [0.0]
We show how an advanced RAG system can help the Large Language Models (LLMs) to generate more accurate answers to drug-discovery-related questions. Secondly, we show how to create an automatic target dossier using LLMs and incorporating them with external tools that they can use to gather data.
arXiv Detail & Related papers (2024-09-24T07:29:05Z)
A Universal Prompting Strategy for Extracting Process Model Information from Natural Language Text using Large Language Models [0.8899670429041453]
We show that generative large language models (LLMs) can solve NLP tasks with very high quality without the need for extensive data. Based on a novel prompting strategy, we show that LLMs are able to outperform state-of-the-art machine learning approaches.
arXiv Detail & Related papers (2024-07-26T06:39:35Z)
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin [68.95137938214862]
We propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks. The evaluation dataset is constructed using an LLM-human cooperative approach and simulates an authentic workflow by leveraging consecutive and interactive IPython sessions. We conduct extensive experiments to analyze the ability of 24 LLMs on CIBench and provide valuable insights for future LLMs in code interpreter utilization.
arXiv Detail & Related papers (2024-07-15T07:43:55Z)
Adaptive Reinforcement Learning Planning: Harnessing Large Language Models for Complex Information Extraction [14.982446379660633]
Existing research on large language models (LLMs) shows that they can solve information extraction tasks through multi-step planning. We observe that decomposing complex extraction tasks and extracting them step by step can effectively improve LLMs' performance. This paper proposes a two-stage multi-step method for LLM-based information extraction and adopts the RL framework to execute the multi-step planning.
arXiv Detail & Related papers (2024-06-17T12:11:01Z)
ST-LLM: Large Language Models Are Effective Temporal Learners [58.79456373423189]
Large Language Models (LLMs) have showcased impressive capabilities in text comprehension and generation. How to effectively encode and understand videos in video-based dialogue systems remains to be solved. We propose ST-LLM, an effective video-LLM baseline with spatial-temporal sequence modeling inside LLM.
arXiv Detail & Related papers (2024-03-30T10:11:26Z)
From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs [62.496139001509114]
We introduce a novel tool invocation pipeline designed to control massive real-world APIs. This pipeline mirrors the human task-solving process, addressing complicated real-life user queries. Empirical evaluations of our Sum2Act pipeline on the ToolBench benchmark show significant performance improvements.
arXiv Detail & Related papers (2024-02-28T08:42:23Z)
INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning [59.07490387145391]
Large language models (LLMs) have demonstrated impressive capabilities in various natural language processing tasks. Their application to information retrieval (IR) tasks is still challenging due to the infrequent occurrence of many IR-specific concepts in natural language. We introduce a novel instruction tuning dataset, INTERS, encompassing 20 tasks across three fundamental IR categories.
arXiv Detail & Related papers (2024-01-12T12:10:28Z)
Large Language Models for Generative Information Extraction: A Survey [89.71273968283616]
Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation. We present an extensive overview by categorizing these works in terms of various IE subtasks and techniques. We empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs.
arXiv Detail & Related papers (2023-12-29T14:25:22Z)
Tree-GPT: Modular Large Language Model Expert System for Forest Remote Sensing Image Understanding and Interactive Analysis [4.993840366641032]
This paper introduces a novel framework, Tree-GPT, which incorporates Large Language Models (LLMs) into the forestry remote sensing data workflow. The prototype system performed well, demonstrating the potential for dynamic usage of LLMs in forestry research and environmental sciences.
arXiv Detail & Related papers (2023-10-07T06:12:39Z)
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks. This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.