Schema as Parameterized Tools for Universal Information Extraction
- URL: http://arxiv.org/abs/2506.01276v1
- Date: Mon, 02 Jun 2025 03:12:44 GMT
- Title: Schema as Parameterized Tools for Universal Information Extraction
- Authors: Sheng Liang, Yongyue Zhang, Yaxiong Wu, Ruiming Tang, Yong Liu,
- Abstract summary: Universal information extraction (UIE) primarily employs an extractive generation approach with large language models (LLMs)<n>We propose a unified adaptive text-to-structure generation framework, called as structureized IE Tools (SPT)
- Score: 27.4621163733051
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Universal information extraction (UIE) primarily employs an extractive generation approach with large language models (LLMs), typically outputting structured information based on predefined schemas such as JSON or tables. UIE suffers from a lack of adaptability when selecting between predefined schemas and on-the-fly schema generation within the in-context learning paradigm, especially when there are numerous schemas to choose from. In this paper, we propose a unified adaptive text-to-structure generation framework, called Schema as Parameterized Tools (SPT), which reimagines the tool-calling capability of LLMs by treating predefined schemas as parameterized tools for tool selection and parameter filling. Specifically, our SPT method can be applied to unify closed, open, and on-demand IE tasks by adopting Schema Retrieval by fetching the relevant schemas from a predefined pool, Schema Filling by extracting information and filling slots as with tool parameters, or Schema Generation by synthesizing new schemas with uncovered cases. Experiments show that the SPT method can handle four distinct IE tasks adaptively, delivering robust schema retrieval and selection performance. SPT also achieves comparable extraction performance to LoRA baselines and current leading UIE systems with significantly fewer trainable parameters.
Related papers
- AI-assisted JSON Schema Creation and Mapping [0.0]
We present a hybrid approach that combines large language models (LLMs) with deterministic techniques to enable creation, modification, and schema mapping based on natural language inputs by the user.<n>This work significantly lowers the barrier to structured data modeling and data integration for non-experts.
arXiv Detail & Related papers (2025-08-07T09:27:10Z) - SchemaGraphSQL: Efficient Schema Linking with Pathfinding Graph Algorithms for Text-to-SQL on Large-Scale Databases [1.6544167074080365]
We present a zero-shot, training-free schema linking approach that first constructs a schema graph based on foreign key relations.<n>We apply classical path-finding algorithms and post-processing to identify the optimal sequence of tables and columns that should be joined.<n>Our method achieves state-of-the-art results on the BIRD benchmark, outperforming previous specialized, fine-tuned, and complex multi-step LLM-based approaches.
arXiv Detail & Related papers (2025-05-23T20:42:36Z) - Adaptive Schema-aware Event Extraction with Retrieval-Augmented Generation [16.423791691552665]
Event extraction (EE) is a fundamental task in natural language processing (NLP) that involves identifying and extracting event information from unstructured text.<n>Existing research exhibits two critical gaps: (1) the rigid schema fixation in existing pipeline systems, and (2) the absence of benchmarks for evaluating joint schema matching and extraction.<n>We propose Adaptive-aware Event Extraction (ASEE), a novel paradigm combining paraphrasing with schema retrieval-augmented generation.
arXiv Detail & Related papers (2025-05-13T15:47:54Z) - Learning to Reason and Navigate: Parameter Efficient Action Planning with Large Language Models [63.765846080050906]
This paper proposes a novel parameter-efficient action planner using large language models (PEAP-LLM) to generate a single-step instruction at each location.<n>Experiments show the superiority of our proposed model on REVERIE compared to the previous state-of-the-art.
arXiv Detail & Related papers (2025-05-12T12:38:20Z) - Advancing and Benchmarking Personalized Tool Invocation for LLMs [66.39214525683425]
We introduce the concept of Personalized Tool Invocation and define two key tasks: Tool Preference and Profile-dependent Query.<n>To tackle these challenges, we propose PTool, a data synthesis framework designed for personalized tool invocation.<n>We construct textbfPTBench, the first benchmark for evaluating personalized tool invocation.
arXiv Detail & Related papers (2025-05-07T02:25:20Z) - SchemaAgent: A Multi-Agents Framework for Generating Relational Database Schema [35.57815867567431]
Existing efforts are mostly based on customized rules or conventional deep learning models, often producing relational schema.<n>We propose a unified LLM-based multi-agent framework for the automated generation of high-quality database schema.Agent.<n>We incorporate dedicated roles for reflection and inspection, alongside an innovative error detection and correction mechanism to identify rectify issues across various phases.
arXiv Detail & Related papers (2025-03-31T09:39:19Z) - Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - ReMatch: Retrieval Enhanced Schema Matching with LLMs [0.874967598360817]
We present a novel method, named ReMatch, for matching schemas using retrieval-enhanced Large Language Models (LLMs)
Our experimental results on large real-world schemas demonstrate that ReMatch is an effective matcher.
arXiv Detail & Related papers (2024-03-03T17:14:40Z) - FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema [36.65009632307124]
We propose Free-from Instruction-oriented Prompt Optimization (FIPO) to improve task performance of large language models (LLMs)<n>FIPO uses a modular APO template that dynamically integrate the naive task instruction, optional instruction responses, and optional ground truth to produce finely optimized prompts.<n>We validate FIPO framework across five public benchmarks and six testing models.
arXiv Detail & Related papers (2024-02-19T03:56:44Z) - Schema-adaptable Knowledge Graph Construction [47.772335354080795]
Conventional Knowledge Graph Construction (KGC) approaches typically follow the static information extraction paradigm with a closed set of pre-defined schema.
We propose a new task called schema-adaptable KGC, which aims to continually extract entity, relation, and event based on a dynamically changing schema graph without re-training.
arXiv Detail & Related papers (2023-05-15T15:06:20Z) - RexUIE: A Recursive Method with Explicit Schema Instructor for Universal
Information Extraction [47.89362854989252]
Universal Information Extraction is an area of interest due to the challenges posed by varying targets, heterogeneous structures, and demand-specific schemas.
Previous works have only achieved limited success by unifying a few tasks, such as Named Entity Recognition (NER) and Relation Extraction (RE)
In this paper, we redefine the authentic UIE with a formal formulation that encompasses almost all extraction schemas.
arXiv Detail & Related papers (2023-04-28T11:28:56Z) - Proton: Probing Schema Linking Information from Pre-trained Language
Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric.
Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences.
Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z) - Schema Extraction on Semi-structured Data [3.09315460664784]
Methods based on tree and graph and statistical methods based on distributed architecture and machine learning to extract schemas.
Extraction tools are mainly used for spark or datasets, and are suitable for small or simple application environments.
System focuses on the extraction and management of schemas in large data sets and complex application scenarios.
arXiv Detail & Related papers (2020-12-15T05:57:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.