Related papers: LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models

Related papers

AI-assisted JSON Schema Creation and Mapping [0.0]
We present a hybrid approach that combines large language models (LLMs) with deterministic techniques to enable creation, modification, and schema mapping based on natural language inputs by the user.<n>This work significantly lowers the barrier to structured data modeling and data integration for non-experts.
arXiv Detail & Related papers (2025-08-07T09:27:10Z)
Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction [28.47810405584841]
Arranged and Organized Extraction Benchmark designed to evaluate ability of large language models to comprehend fragmented documents.<n>AOE includes 11 carefully crafted tasks across three diverse domains, requiring models to generate context-specific schema tailored to varied input queries.<n>Results show that even the most advanced models struggled significantly.
arXiv Detail & Related papers (2025-07-22T06:37:51Z)
Leveraging Machine Learning and Enhanced Parallelism Detection for BPMN Model Generation from Text [75.77648333476776]
This paper introduces an automated pipeline for extracting BPMN models from text.<n>A key contribution of this work is the introduction of a newly annotated dataset.<n>We augment the dataset with 15 newly annotated documents containing 32 parallel gateways for model training.
arXiv Detail & Related papers (2025-07-11T07:25:55Z)
Large Language Models are Good Relational Learners [55.40941576497973]
We introduce Rel-LLM, a novel architecture that utilizes a graph neural network (GNN)- based encoder to generate structured relational prompts for large language models (LLMs)<n>Unlike traditional text-based serialization approaches, our method preserves the inherent relational structure of databases while enabling LLMs to process and reason over complex entity relationships.
arXiv Detail & Related papers (2025-06-06T04:07:55Z)
Schema Generation for Large Knowledge Graphs Using Large Language Models [5.764388991407566]
We explore automatic schema generation using large language models (LLMs)<n>Our benchmark introduces a new challenge for structured generation, pushing the limits of LLMs on syntactically rich formalisms.
arXiv Detail & Related papers (2025-06-04T23:25:16Z)
AutoSchemaKG: Autonomous Knowledge Graph Construction through Dynamic Schema Induction from Web-Scale Corpora [51.77079220622184]
We present AutoKG, a framework for fully autonomous knowledge graph construction.<n>We leverage large language models to simultaneously extract knowledge triples and induce comprehensive schemas directly from text.<n>We construct ATLAS (Automated Triple Linking And Induction), a family of knowledge graphs with 900+ million nodes and 5.9 billion edges.
arXiv Detail & Related papers (2025-05-29T16:34:58Z)
MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs [54.5729817345543]
MOLE is a framework that automatically extracts metadata attributes from scientific papers covering datasets of languages other than Arabic.<n>Our methodology processes entire documents across multiple input formats and incorporates robust validation mechanisms for consistent output.
arXiv Detail & Related papers (2025-05-26T10:31:26Z)
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models [58.45517851437422]
Visually-situated text parsing (VsTP) has recently seen notable advancements, driven by the growing demand for automated document understanding. Existing solutions often rely on task-specific architectures and objectives for individual tasks. In this paper, we introduce Omni V2, a universal model that unifies VsTP typical tasks, including text spotting, key information extraction, table recognition, and layout analysis.
arXiv Detail & Related papers (2025-02-22T09:32:01Z)
Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring. Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations. Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z)
End-to-End Ontology Learning with Large Language Models [11.755755139228219]
Large language models (LLMs) have been applied to solve various subtasks of ontology learning. We address this gap by OLLM, a general and scalable method for building the taxonomic backbone of an ontology from scratch. In contrast to standard metrics, our metrics use deep learning techniques to define more robust structural distance measures between graphs. Our model can be effectively adapted to new domains, like arXiv, needing only a small number of training examples.
arXiv Detail & Related papers (2024-10-31T02:52:39Z)
LangSuitE: Planning, Controlling and Interacting with Large Language Models in Embodied Text Environments [70.91258869156353]
We introduce LangSuitE, a versatile and simulation-free testbed featuring 6 representative embodied tasks in textual embodied worlds. Compared with previous LLM-based testbeds, LangSuitE offers adaptability to diverse environments without multiple simulation engines. We devise a novel chain-of-thought (CoT) schema, EmMem, which summarizes embodied states w.r.t. history information.
arXiv Detail & Related papers (2024-06-24T03:36:29Z)
Meta-Task Prompting Elicits Embeddings from Large Language Models [54.757445048329735]
We introduce a new unsupervised text embedding method, Meta-Task Prompting with Explicit One-Word Limitation. We generate high-quality sentence embeddings from Large Language Models without the need for model fine-tuning. Our findings suggest a new scaling law, offering a versatile and resource-efficient approach for embedding generation across diverse scenarios.
arXiv Detail & Related papers (2024-02-28T16:35:52Z)
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models [153.14575887549088]
We introduce Generalized Instruction Tuning (called GLAN), a general and scalable method for instruction tuning of Large Language Models (LLMs) GLAN exclusively utilizes a pre-curated taxonomy of human knowledge and capabilities as input and generates large-scale synthetic instruction data across all disciplines. With the fine-grained key concepts detailed in every class session of the syllabus, we are able to generate diverse instructions with a broad coverage across the entire spectrum of human knowledge and skills.
arXiv Detail & Related papers (2024-02-20T15:00:35Z)
Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction. We reformulate the task to be entity-centric, enabling the use of diverse metrics. We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z)
Benchmarking Large Language Models with Augmented Instructions for Fine-grained Information Extraction [46.09887436555637]
This paper introduces a fine-grained IE benchmark dataset tailored for Large Language Models (LLMs) Through extensive evaluations, we observe that encoder-decoder models, particularly T5 and FLAN-T5, perform well in generalizing to unseen information types.
arXiv Detail & Related papers (2023-10-08T09:41:18Z)
Local Large Language Models for Complex Structured Medical Tasks [0.0]
This paper introduces an approach that combines the language reasoning capabilities of large language models with the benefits of local training to tackle complex, domain-specific tasks. Specifically, the authors demonstrate their approach by extracting structured condition codes from pathology reports.
arXiv Detail & Related papers (2023-08-03T12:36:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.