AI-assisted JSON Schema Creation and Mapping
- URL: http://arxiv.org/abs/2508.05192v1
- Date: Thu, 07 Aug 2025 09:27:10 GMT
- Title: AI-assisted JSON Schema Creation and Mapping
- Authors: Felix Neubauer, Jürgen Pleiss, Benjamin Uekermann,
- Abstract summary: We present a hybrid approach that combines large language models (LLMs) with deterministic techniques to enable creation, modification, and schema mapping based on natural language inputs by the user.<n>This work significantly lowers the barrier to structured data modeling and data integration for non-experts.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-Driven Engineering (MDE) places models at the core of system and data engineering processes. In the context of research data, these models are typically expressed as schemas that define the structure and semantics of datasets. However, many domains still lack standardized models, and creating them remains a significant barrier, especially for non-experts. We present a hybrid approach that combines large language models (LLMs) with deterministic techniques to enable JSON Schema creation, modification, and schema mapping based on natural language inputs by the user. These capabilities are integrated into the open-source tool MetaConfigurator, which already provides visual model editing, validation, code generation, and form generation from models. For data integration, we generate schema mappings from heterogeneous JSON, CSV, XML, and YAML data using LLMs, while ensuring scalability and reliability through deterministic execution of generated mapping rules. The applicability of our work is demonstrated in an application example in the field of chemistry. By combining natural language interaction with deterministic safeguards, this work significantly lowers the barrier to structured data modeling and data integration for non-experts.
Related papers
- OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models [57.94189874119267]
Multi-Agent Systems (MAS) offer a powerful paradigm for solving complex problems.<n>Current graph learning-based design methodologies often adhere to a "one-for-one" paradigm.<n>We propose OFA-TAD, a one-for-all framework that generates adaptive collaboration graphs for any task described in natural language.
arXiv Detail & Related papers (2026-01-19T12:23:44Z) - Meta-probabilistic Modeling [36.339664748241944]
We propose meta-probabilistic modeling (MPM), a meta-learning algorithm that learns generative model structure directly from multiple related datasets.<n>For learning and inference, we propose a tractable VAE-inspired surrogate objective, and optimize it through bi-level optimization.<n>We evaluate MPM on object-centric image modeling and sequential text modeling, demonstrating that it adapts generative models to data while recovering meaningful latent representations.
arXiv Detail & Related papers (2026-01-08T00:34:06Z) - Affordance Representation and Recognition for Autonomous Agents [64.39018305018904]
This paper introduces a pattern language for world modeling from structured data.<n>The DOM Transduction Pattern addresses the challenge of web page complexity.<n>The Hypermedia Affordances Recognition Pattern enables the agent to dynamically enrich its world model.
arXiv Detail & Related papers (2025-10-28T14:27:28Z) - PARSE: LLM Driven Schema Optimization for Reliable Entity Extraction [3.314906482758872]
Recent approaches apply large language models directly to extraction tasks using existing schemas, often with constraint decoding or reinforcement learning approaches to ensure syntactic validity, but treat schemas as static contracts designed for human developers, leading to suboptimal extraction performance, frequent hallucinations, and unreliable agent behavior when schemas contain ambiguous or incomplete specifications.<n>We develop PARSE, a novel system with two synergistic components: ARCHITECT, which autonomously optimize schemas for consumption while maintaining backward compatibility through RELAY, and SCOPE, which implements static and LLM-based extraction with combined framework improvements reaching 10% across models, while reducing extraction errors by 92% within the first retry
arXiv Detail & Related papers (2025-10-08T09:40:30Z) - RouteNator: A Router-Based Multi-Modal Architecture for Generating Synthetic Training Data for Function Calling LLMs [3.41612427812159]
In digital content creation tools, users express their needs through natural language queries that must be mapped to API calls.<n>Existing approaches to synthetic data generation fail to replicate real-world data distributions.<n>We present a novel router-based architecture that generates high-quality synthetic training data.
arXiv Detail & Related papers (2025-05-15T16:53:45Z) - LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models [0.22470290096767]
Traditional schema mining relies on semi-structured data, limiting scalability.<n>This paper introduces schema-miner, a novel tool that combines large language models with human feedback to automate and refine schema extraction.
arXiv Detail & Related papers (2025-04-01T13:03:33Z) - SchemaAgent: A Multi-Agents Framework for Generating Relational Database Schema [35.57815867567431]
Existing efforts are mostly based on customized rules or conventional deep learning models, often producing relational schema.<n>We propose a unified LLM-based multi-agent framework for the automated generation of high-quality database schema.Agent.<n>We incorporate dedicated roles for reflection and inspection, alongside an innovative error detection and correction mechanism to identify rectify issues across various phases.
arXiv Detail & Related papers (2025-03-31T09:39:19Z) - Learning to Generate Structured Output with Schema Reinforcement Learning [83.09230124049667]
This study investigates the structured generation capabilities of large language models (LLMs)<n>We find that the latest LLMs are still struggling to generate a valid string.<n>Our models demonstrate significant improvement in both generating outputs and downstream tasks.
arXiv Detail & Related papers (2025-02-26T06:45:29Z) - Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - Adapting Large Language Models for Content Moderation: Pitfalls in Data
Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains.
In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z) - Metadata Representations for Queryable ML Model Zoos [73.24799582702326]
Machine learning (ML) practitioners and organizations are building model zoos of pre-trained models, containing metadata describing properties of the models.
The metatada is currently not standardised; its expressivity is limited; and there is no way to store and query it.
In this paper, we advocate for standardized ML model meta-data representation and management, proposing a toolkit supported to help practitioners manage and query that metadata.
arXiv Detail & Related papers (2022-07-19T15:04:14Z) - Learning to Synthesize Data for Semantic Parsing [57.190817162674875]
We propose a generative model which models the composition of programs and maps a program to an utterance.
Due to the simplicity of PCFG and pre-trained BART, our generative model can be efficiently learned from existing data at hand.
We evaluate our method in both in-domain and out-of-domain settings of text-to-Query parsing on the standard benchmarks of GeoQuery and Spider.
arXiv Detail & Related papers (2021-04-12T21:24:02Z) - Automated Metadata Harmonization Using Entity Resolution & Contextual
Embedding [0.0]
We demonstrate automation of this step with the help of Cogntive Database's Db2Vec embedding approach.
Apart from matching schemas, we demonstrate that it can also infer the correct ontological structure of the target data model.
arXiv Detail & Related papers (2020-10-17T02:14:15Z) - Interpretable Entity Representations through Large-Scale Typing [61.4277527871572]
We present an approach to creating entity representations that are human readable and achieve high performance out of the box.
Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types.
We show that it is possible to reduce the size of our type set in a learning-based way for particular domains.
arXiv Detail & Related papers (2020-04-30T23:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.