Related papers: DeepJSONEval: Benchmarking Complex Nested JSON Data Mining for Large Language Models

DeepJSONEval: Benchmarking Complex Nested JSON Data Mining for Large Language Models

URL: http://arxiv.org/abs/2509.25922v1
Date: Tue, 30 Sep 2025 08:18:20 GMT
Title: DeepJSONEval: Benchmarking Complex Nested JSON Data Mining for Large Language Models
Authors: Zhicheng Zhou, Jing Li, Suming Qiu, Junjie Huang, Linyuan Qiu, Zhijie Sun,
Abstract summary: Multi-layer nested structures organize data into key-value pairs, arrays, and nested objects.<n>For instance, in news aggregation, a object can nest an article's metadata (title, author, date), content (text, multimedia), and multimedia information (multimedia, caption) hierarchically.<n>We introduce DeepJSONEval, a novel benchmark featuring 2100 multi-domain instances with deep nested structures, categorized by difficulty.
Score: 6.653834890554154
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The internet is saturated with low-density, high-redundancy information, such as social media comments, repetitive news, and lengthy discussions, making it difficult to extract valuable insights efficiently. Multi-layer nested JSON structures provide an effective solution by compressing such information into semantically rich, hierarchical representations, which organize data into key-value pairs, arrays, and nested objects, preserving contextual relationships and enabling efficient storage, retrieval, and semantic querying. For instance, in news aggregation, a JSON object can nest an article's metadata (title, author, date), content (text, multimedia), and multimedia information (multimedia type, caption) hierarchically. Large Language Models (LLMs) play a transformative role in web data mining by parsing unstructured text and outputting structured results directly into complex JSON schemas. However, current benchmarks for evaluating LLMs' JSON output capabilities overemphasize pure JSON generation rather than assessing data comprehension and extraction abilities, a limitation that lacks relevance to practical web data mining tasks. To address this, we introduce DeepJSONEval, a novel benchmark featuring 2100 multi-domain instances with deep nested structures, categorized by difficulty. Experiments show significant performance gaps among LLMs in handling such complexity. Our benchmark and datasets are open-sourced to advance research in structured JSON generation.(https://github.com/GTS-AI-Infra-Lab-SotaS/DeepJSONEval).

Related papers

ScrapeGraphAI-100k: A Large-Scale Dataset for LLM-Based Web Information Extraction [0.0]
We introduce ScrapeGraphAI-100k, a large-scale dataset of real-world LLM extraction events.<n>Starting from 9M events, we deduplicate and balance by schema to produce 93,695 examples spanning diverse domains.<n>We characterize the datasets structural diversity and its failure modes as schema complexity.
arXiv Detail & Related papers (2026-02-16T20:56:59Z)
Skeletons Matter: Dynamic Data Augmentation for Text-to-Query [66.52311036179294]
We formally define the Text-to-Query task paradigm, unifying semantic parsing tasks across various query languages.<n>We identify query skeletons as a shared optimization target of Text-to-Query tasks, and propose a general dynamic data augmentation framework.<n> Experiments on four Text-to-Query benchmarks demonstrate that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-11-24T09:39:03Z)
JSON Whisperer: Efficient JSON Editing with LLMs [1.0535472555708638]
Large language models (LLMs) can modify documents through natural language commands, but current approaches regenerate entire structures for each edit, resulting in computational inefficiency.<n>We present Whisperer, a framework that enables LLMs to generate RFC 6902 diff patches-expressing only the necessary modifications-rather than complete documents.
arXiv Detail & Related papers (2025-10-06T11:36:46Z)
Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction [80.88654868264645]
Arranged and Organized Extraction Benchmark designed to evaluate ability of large language models to comprehend fragmented documents.<n>AOE includes 11 carefully crafted tasks across three diverse domains, requiring models to generate context-specific schema tailored to varied input queries.<n>Results show that even the most advanced models struggled significantly.
arXiv Detail & Related papers (2025-07-22T06:37:51Z)
MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs [48.73595915402094]
MOLE is a framework that automatically extracts metadata attributes from scientific papers covering datasets of languages other than Arabic.<n>Our methodology processes entire documents across multiple input formats and incorporates robust validation mechanisms for consistent output.
arXiv Detail & Related papers (2025-05-26T10:31:26Z)
NEXT-EVAL: Next Evaluation of Traditional and LLM Web Data Record Extraction [6.09502686736443]
We introduce a concrete evaluation framework for web data record extraction.<n>Our framework generates evaluation snapshots, annotates supervision labels, and employs structure-aware metrics for consistent scoring.<n>It also incorporates preprocessing to optimize input for Large Language Model (LLM)-based approaches.
arXiv Detail & Related papers (2025-05-21T21:03:37Z)
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration [49.180693704510006]
Referring Expression (REC) is a cross-modal task that evaluates the interplay of language understanding, image comprehension, and language-to-image grounding.<n>It serves as an essential testing ground for Multimodal Large Language Models (MLLMs)
arXiv Detail & Related papers (2025-02-27T13:58:44Z)
Learning to Generate Structured Output with Schema Reinforcement Learning [83.09230124049667]
This study investigates the structured generation capabilities of large language models (LLMs)<n>We find that the latest LLMs are still struggling to generate a valid string.<n>Our models demonstrate significant improvement in both generating outputs and downstream tasks.
arXiv Detail & Related papers (2025-02-26T06:45:29Z)
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data. We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation. Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z)
MatViX: Multimodal Information Extraction from Visually Rich Articles [6.349779979863784]
In materials science, extracting structured information from research articles can accelerate the discovery of new materials. We introduce textscMatViX, a benchmark consisting of $324$ full-length research articles and $1,688$ complex structured files. These files are extracted from text, tables, and figures in full-length documents, providing a comprehensive challenge for MIE.
arXiv Detail & Related papers (2024-10-27T16:13:58Z)
AutoScraper: A Progressive Understanding Web Agent for Web Scraper Generation [54.17246674188208]
Web scraping is a powerful technique that extracts data from websites, enabling automated data collection, enhancing data analysis capabilities, and minimizing manual data entry efforts. Existing methods, wrappers-based methods suffer from limited adaptability and scalability when faced with a new website. We introduce the paradigm of generating web scrapers with large language models (LLMs) and propose AutoScraper, a two-stage framework that can handle diverse and changing web environments more efficiently.
arXiv Detail & Related papers (2024-04-19T09:59:44Z)
A Framework for End-to-End Learning on Semantic Tree-Structured Data [4.241801379755808]
A common form of structured data is what we term "semantic tree-structures" We propose a novel framework for end-to-end learning on generic semantic tree-structured data.
arXiv Detail & Related papers (2020-02-13T18:49:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.