Related papers: Leveraging Vision Capabilities of Multimodal LLMs for Automated Data Extraction from Plots

Leveraging Vision Capabilities of Multimodal LLMs for Automated Data Extraction from Plots

URL: http://arxiv.org/abs/2503.12326v1
Date: Sun, 16 Mar 2025 02:41:43 GMT
Title: Leveraging Vision Capabilities of Multimodal LLMs for Automated Data Extraction from Plots
Authors: Maciej P. Polak, Dane Morgan,
Abstract summary: We show that current large language models, with proper instructions and engineered prompts, are capable of accurately extracting data from plots.<n>This capability is inherent to the pretrained models and can be achieved with a chain-of-thought sequence of zero-shot engineered prompts.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automated data extraction from research texts has been steadily improving, with the emergence of large language models (LLMs) accelerating progress even further. Extracting data from plots in research papers, however, has been such a complex task that it has predominantly been confined to manual data extraction. We show that current multimodal large language models, with proper instructions and engineered workflows, are capable of accurately extracting data from plots. This capability is inherent to the pretrained models and can be achieved with a chain-of-thought sequence of zero-shot engineered prompts we call PlotExtract, without the need to fine-tune. We demonstrate PlotExtract here and assess its performance on synthetic and published plots. We consider only plots with two axes in this analysis. For plots identified as extractable, PlotExtract finds points with over 90% precision (and around 90% recall) and errors in x and y position of around 5% or lower. These results prove that multimodal LLMs are a viable path for high-throughput data extraction for plots and in many circumstances can replace the current manual methods of data extraction.

Related papers

MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs [54.5729817345543]
MOLE is a framework that automatically extracts metadata attributes from scientific papers covering datasets of languages other than Arabic.<n>Our methodology processes entire documents across multiple input formats and incorporates robust validation mechanisms for consistent output.
arXiv Detail & Related papers (2025-05-26T10:31:26Z)
Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation [98.92677830223786]
This work revisits scaling with synthetic data and focuses on developing video-LLMs from a data-centric perspective.<n>We propose a data augmentation method called Sparrow, which synthesizes video-like samples from pure text instruction data.<n>Our proposed method achieves performance comparable to or even superior to baselines trained with many more samples.
arXiv Detail & Related papers (2024-11-29T18:59:54Z)
Measuring memorization in language models via probabilistic extraction [29.438509661725117]
Large language models (LLMs) are susceptible to memorizing training data.<n>Discoverable extraction is the most common method for measuring this issue.<n>We introduce probabilistic discoverable extraction, which, without additional cost, relaxes discoverable extraction by considering multiple queries.
arXiv Detail & Related papers (2024-10-25T11:37:04Z)
Distilled Pruning: Using Synthetic Data to Win the Lottery [2.4366811507669124]
This work introduces a novel approach to pruning deep learning models by using distilled data. Our approach can find sparse, trainableworks up to 5x faster than Iterative Magnitude Pruning at comparable sparsity on CIFAR-10. The experimental results highlight the potential of using distilled data for resource-efficient neural network pruning, model compression, and neural architecture search.
arXiv Detail & Related papers (2023-07-07T03:07:28Z)
Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis. For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z)
Extracting Accurate Materials Data from Research Papers with Conversational Language Models and Prompt Engineering [0.0]
ChatExtract can fully automate very accurate data extraction with minimal initial effort and background. In tests on materials data we find precision and recall both close to 90% from the best conversational LLMs.
arXiv Detail & Related papers (2023-03-07T17:54:53Z)
Bag of Tricks for Training Data Extraction from Language Models [98.40637430115204]
We investigate and benchmark tricks for improving training data extraction using a publicly available dataset. The experimental results show that several previously overlooked tricks can be crucial to the success of training data extraction.
arXiv Detail & Related papers (2023-02-09T06:46:42Z)
Multimodal Approach for Metadata Extraction from German Scientific Publications [0.0]
We propose a multimodal deep learning approach for metadata extraction from scientific papers in the German language. We consider multiple types of input data by combining natural language processing and image vision processing. Our model for this approach was trained on a dataset consisting of around 8800 documents and is able to obtain an overall F1-score of 0.923.
arXiv Detail & Related papers (2021-11-10T15:19:04Z)
Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning [62.47593143542552]
We describe a subset optimization approach for automatic dataset curation. We demonstrate that our approach finds videos with high audio-visual correspondence and show that self-supervised models trained on our data, despite being automatically constructed, achieve similar downstream performances to existing video datasets with similar scales.
arXiv Detail & Related papers (2021-01-26T14:27:47Z)
APEX-Net: Automatic Plot Extractor Network [24.299931323012757]
We propose APEX-Net, a deep learning based framework with novel loss functions for solving the plot extraction problem. We introduce APEX-1M, a new large scale dataset which contains both the plot images and the raw data. We show visual results of our network on unseen plot images and demonstrate that it extracts the shape of the plots to a great extent.
arXiv Detail & Related papers (2021-01-15T17:02:36Z)
Scaling Systematic Literature Reviews with Machine Learning Pipelines [57.82662094602138]
Systematic reviews entail the extraction of data from scientific documents. We construct a pipeline that automates each of these aspects, and experiment with many human-time vs. system quality trade-offs. We find that we can get surprising accuracy and generalisability of the whole pipeline system with only 2 weeks of human-expert annotation.
arXiv Detail & Related papers (2020-10-09T16:19:42Z)
At Which Level Should We Extract? An Empirical Analysis on Extractive Document Summarization [110.54963847339775]
We show that unnecessity and redundancy issues exist when extracting full sentences. We propose extracting sub-sentential units based on the constituency parsing tree.
arXiv Detail & Related papers (2020-04-06T13:35:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.