Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs
- URL: http://arxiv.org/abs/2502.18179v1
- Date: Tue, 25 Feb 2025 13:11:53 GMT
- Title: Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs
- Authors: Gaye Colakoglu, Gürkan Solmaz, Jonathan Fürst,
- Abstract summary: This paper defines and explores the design space for information extraction from layout-rich documents using large language models (LLMs)<n>Our study delves into the sub-problems within these core challenges, such as input representation, chunking, prompting, and selection of LLMs and multimodal models.<n>It examines the outcomes of different design choices through a new layout-aware IE test suite, benchmarking against the state-of-art (SoA) model LayoutLMv3.
- Score: 0.28207011158655404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper defines and explores the design space for information extraction (IE) from layout-rich documents using large language models (LLMs). The three core challenges of layout-aware IE with LLMs are 1) data structuring, 2) model engagement, and 3) output refinement. Our study delves into the sub-problems within these core challenges, such as input representation, chunking, prompting, and selection of LLMs and multimodal models. It examines the outcomes of different design choices through a new layout-aware IE test suite, benchmarking against the state-of-art (SoA) model LayoutLMv3. The results show that the configuration from one-factor-at-a-time (OFAT) trial achieves near-optimal results with 14.1 points F1-score gain from the baseline model, while full factorial exploration yields only a slightly higher 15.1 points gain at around 36x greater token usage. We demonstrate that well-configured general-purpose LLMs can match the performance of specialized models, providing a cost-effective alternative. Our test-suite is freely available at https://github.com/gayecolakoglu/LayIE-LLM.
Related papers
- Slm-mux: Orchestrating small language models for reasoning [52.461958665375896]
We propose a three-stage approach for orchestrating small language models (SLMs)<n>First, we introduce SLM-MUX, a multi-model architecture that effectively coordinates multiple SLMs.<n>With just two SLMS, SLM-MUX outperforms Qwen 2.5 72B on GPQA and GSM8K, and matches its performance on MATH.
arXiv Detail & Related papers (2025-10-06T17:49:58Z) - DaMoC: Efficiently Selecting the Optimal Large Language Model for Fine-tuning Domain Tasks Based on Data and Model Compression [7.1654056866441245]
Large language models (LLMs) excel in general tasks but struggle with domain-specific ones, requiring fine-tuning with specific data.<n>We introduce a Data and Model Compression Framework (DaMoC) that addresses this challenge.<n>We show that we can select the optimal LLM while saving approximately 20-fold in training time.
arXiv Detail & Related papers (2025-09-01T08:06:49Z) - Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction [28.47810405584841]
Arranged and Organized Extraction Benchmark designed to evaluate ability of large language models to comprehend fragmented documents.<n>AOE includes 11 carefully crafted tasks across three diverse domains, requiring models to generate context-specific schema tailored to varied input queries.<n>Results show that even the most advanced models struggled significantly.
arXiv Detail & Related papers (2025-07-22T06:37:51Z) - Fusing LLM Capabilities with Routing Data [34.769509452692226]
FusionFactory is a systematic fusion framework with three levels: query-level fusion, thought-level fusion, and model-level fusion.<n>Experiments show FusionFactory consistently outperforms the best individual LLM across all 14 benchmarks.
arXiv Detail & Related papers (2025-07-14T17:58:02Z) - Tuning the Right Foundation Models is What you Need for Partial Label Learning [55.61644150441799]
Partial label learning seeks to train generalizable classifiers from datasets with inexact supervision.<n>In this work, we empirically conduct evaluations of 11 foundation models across 13 approaches on 8 benchmark datasets under 3 scenarios.<n>Our findings reveal that current approaches tend to achieve significant performance gains when using foundation models, 2) exhibit remarkably similar performance to each other, 3) maintain stable performance across varying ambiguity levels, while 4) are susceptible to foundation model selection and adaptation strategies.
arXiv Detail & Related papers (2025-06-05T13:37:33Z) - DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation [83.21140655248624]
Large language models (LLMs) have been introduced into recommender systems (RSs)<n>We propose DeepRec, a novel LLM-based RS that enables autonomous multi-turn interactions between LLMs and TRMs for deep exploration of the item space.<n> Experiments on public datasets demonstrate that DeepRec significantly outperforms both traditional and LLM-based baselines.
arXiv Detail & Related papers (2025-05-22T15:49:38Z) - When Do LLMs Help With Node Classification? A Comprehensive Analysis [21.120619437937382]
We develop a comprehensive and testbed for node classification using Large Language Models (LLMs)<n>It includes 10 homophilic datasets, 4 heterophilic datasets, 8 LLM-based algorithms, 8 classic baselines, and 3 learning paradigms.<n>Our findings uncover 8 insights, e.g., (1) LLM-based methods can significantly outperform traditional methods in a semi-supervised setting, while the advantage is marginal in a supervised setting.
arXiv Detail & Related papers (2025-02-02T15:56:05Z) - SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution [56.9361004704428]
Large Language Models (LLMs) have demonstrated remarkable proficiency across a variety of complex tasks.<n>SWE-Fixer is a novel open-source framework designed to effectively and efficiently resolve GitHub issues.<n>We assess our approach on the SWE-Bench Lite and Verified benchmarks, achieving state-of-the-art performance among open-source models.
arXiv Detail & Related papers (2025-01-09T07:54:24Z) - PickLLM: Context-Aware RL-Assisted Large Language Model Routing [0.5325390073522079]
PickLLM is a lightweight framework that relies on Reinforcement Learning (RL) to route on-the-fly queries to available models.<n>We demonstrate the speed of convergence for different learning rates and improvement in hard metrics such as cost per querying session and overall response latency.
arXiv Detail & Related papers (2024-12-12T06:27:12Z) - Smoothie: Label Free Language Model Routing [39.88041397482366]
Large language models (LLMs) are increasingly used in applications where LLM inputs may span many different tasks.<n>We propose Smoothie, a weak supervision-inspired routing approach that requires no labeled data.<n>We find that Smoothie's LLM quality-scores correlate with ground-truth model quality.
arXiv Detail & Related papers (2024-12-06T01:06:37Z) - Dspy-based Neural-Symbolic Pipeline to Enhance Spatial Reasoning in LLMs [29.735465300269993]
Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, yet they often struggle with spatial reasoning.<n>This paper presents a novel neural-symbolic framework that enhances LLMs' spatial reasoning abilities through iterative feedback between LLMs and Answer Set Programming (ASP)<n>We evaluate our approach on two benchmark datasets: StepGame and SparQA.
arXiv Detail & Related papers (2024-11-27T18:04:05Z) - The Fellowship of the LLMs: Multi-Model Workflows for Synthetic Preference Optimization Dataset Generation [4.524402497958597]
This paper presents a novel methodology for generating synthetic Preference Optimization (PO) datasets using multi-models.<n>We evaluate the effectiveness and potential of these in automating and enhancing the dataset generation process.
arXiv Detail & Related papers (2024-08-16T12:01:55Z) - Large Language Model-guided Document Selection [23.673690115025913]
Large Language Model (LLM) pre-training exhausts an ever growing compute budget.
Recent research has demonstrated that careful document selection enables comparable model quality with only a fraction of the FLOPs.
We explore a promising direction for scalable general-domain document selection.
arXiv Detail & Related papers (2024-06-07T04:52:46Z) - Self-Exploring Language Models: Active Preference Elicitation for Online Alignment [88.56809269990625]
We propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions.
Our experimental results demonstrate that when fine-tuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, Self-Exploring Language Models (SELM) significantly boosts the performance on instruction-following benchmarks.
arXiv Detail & Related papers (2024-05-29T17:59:07Z) - Beyond Scaling: Predicting Patent Approval with Domain-specific Fine-grained Claim Dependency Graph [28.13334909565348]
In this paper, we unveil that simple domain-specific graph methods outperform the model, using the intrinsic dependencies within the patent data.
We propose a novel Fine-grained cLAim depeNdency (FLAN) Graph through meticulous patent data analyses.
arXiv Detail & Related papers (2024-04-22T17:22:31Z) - Optimizing LLM Queries in Relational Data Analytics Workloads [50.95919232839785]
Batch data analytics is a growing application for Large Language Models (LLMs)
LLMs enable users to perform a wide range of natural language tasks, such as classification, entity extraction, and translation, over large datasets.
We propose novel techniques that can significantly reduce the cost of LLM calls for relational data analytics workloads.
arXiv Detail & Related papers (2024-03-09T07:01:44Z) - FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability [70.84333325049123]
FoFo is a pioneering benchmark for evaluating large language models' (LLMs) ability to follow complex, domain-specific formats.
This paper presents FoFo, a pioneering benchmark for evaluating large language models' (LLMs) ability to follow complex, domain-specific formats.
arXiv Detail & Related papers (2024-02-28T19:23:27Z) - GenSERP: Large Language Models for Whole Page Presentation [22.354349023665538]
GenSERP is a framework that leverages large language models with vision in a few-shot setting to dynamically organize intermediate search results.
Our approach has three main stages: information gathering, answer generation, and scoring phase.
arXiv Detail & Related papers (2024-02-22T05:41:24Z) - Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency [127.97467912117652]
Large language models (LLMs) have exhibited remarkable ability in code generation.
However, generating the correct solution in a single attempt still remains a challenge.
We propose the Multi-Perspective Self-Consistency (MPSC) framework incorporating both inter- and intra-consistency.
arXiv Detail & Related papers (2023-09-29T14:23:26Z) - Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? [49.688233418425995]
Struc-Bench is a comprehensive benchmark featuring prominent Large Language Models (LLMs)
We propose two innovative metrics, P-Score (Prompting Score) and H-Score (Heuristical Score)
Our experiments show that applying our structure-aware fine-tuning to LLaMA-7B leads to substantial performance gains.
arXiv Detail & Related papers (2023-09-16T11:31:58Z) - MLLM-DataEngine: An Iterative Refinement Approach for MLLM [62.30753425449056]
We propose a novel closed-loop system that bridges data generation, model training, and evaluation.
Within each loop, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results.
For targeting, we propose an Adaptive Bad-case Sampling module, which adjusts the ratio of different types of data.
For quality, we resort to GPT-4 to generate high-quality data with each given data type.
arXiv Detail & Related papers (2023-08-25T01:41:04Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for
Document Information Extraction [56.790794611002106]
Large language models (LLMs) have demonstrated remarkable results in various natural language processing (NLP) tasks with in-context learning.
We propose a simple but effective in-context learning framework called ICL-D3IE.
Specifically, we extract the most difficult and distinct segments from hard training documents as hard demonstrations.
arXiv Detail & Related papers (2023-03-09T06:24:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.