Related papers: PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents

PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents

URL: http://arxiv.org/abs/2501.11233v1
Date: Mon, 20 Jan 2025 02:31:52 GMT
Title: PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents
Authors: Kanika Goswami, Puneet Mathur, Ryan Rossi, Franck Dernoncourt,
Abstract summary: PlotEdit is a novel multi-agent framework for natural language-driven end-to-end chart image editing.<n>PlotEdit orchestrates five LLM agents: Chart2Table for data table extraction, Chart2Vision for style identification, Chart2Code for retrieving rendering code, Instruction Decomposition Agent for parsing user requests into executable steps, and Multimodal Editing Agent for implementing nuanced chart component modifications.<n>PlotEdit outperforms existing baselines on the ChartCraft dataset across style, layout, format, and data-centric edits.
Score: 47.79080056618323
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Chart visualizations, while essential for data interpretation and communication, are predominantly accessible only as images in PDFs, lacking source data tables and stylistic information. To enable effective editing of charts in PDFs or digital scans, we present PlotEdit, a novel multi-agent framework for natural language-driven end-to-end chart image editing via self-reflective LLM agents. PlotEdit orchestrates five LLM agents: (1) Chart2Table for data table extraction, (2) Chart2Vision for style attribute identification, (3) Chart2Code for retrieving rendering code, (4) Instruction Decomposition Agent for parsing user requests into executable steps, and (5) Multimodal Editing Agent for implementing nuanced chart component modifications - all coordinated through multimodal feedback to maintain visual fidelity. PlotEdit outperforms existing baselines on the ChartCraft dataset across style, layout, format, and data-centric edits, enhancing accessibility for visually challenged users and improving novice productivity.

Related papers

RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning [63.599057862999]
RefChartQA is a novel benchmark that integrates Chart Question Answering (ChartQA) with visual grounding. Our experiments demonstrate that incorporating spatial awareness via grounding improves response accuracy by over 15%.
arXiv Detail & Related papers (2025-03-29T15:50:08Z)
ChartAdapter: Large Vision-Language Model for Chart Summarization [13.499376163294816]
ChartAdapter is a lightweight transformer module designed to bridge the gap between charts and textual summaries.<n>By integrating ChartAdapter with an LLM, we enable end-to-end training and efficient chart summarization.
arXiv Detail & Related papers (2024-12-30T05:07:34Z)
Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback [37.275533538711436]
We propose a hierarchical pipeline and a new dataset for chart generation. Our dataset, Text2Chart31, includes 31 unique plot types referring to the Matplotlib library. We introduce a reinforcement learning-based instruction tuning technique for chart generation tasks without requiring human feedback.
arXiv Detail & Related papers (2024-10-05T07:25:56Z)
ChartFormer: A Large Vision Language Model for Converting Chart Images into Tactile Accessible SVGs [20.690529354141116]
We leverage advancements in the field of chart analysis to generate tactile charts in an end-to-end manner. Our three key contributions are as follows: (1) the ChartFormer model trained to convert chart images into tactile-accessible SVGs, (2) training this model on the Chart2Tactile dataset, and (3) evaluating the effectiveness of our SVGs through a pilot user study with a refreshable two-dimensional tactile display.
arXiv Detail & Related papers (2024-05-29T14:24:42Z)
SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing [53.00272278754867]
SEED-Data-Edit is a hybrid dataset for instruction-guided image editing. High-quality editing data produced by an automated pipeline. Real-world scenario data collected from the internet. High-precision multi-turn editing data annotated by humans.
arXiv Detail & Related papers (2024-05-07T04:55:47Z)
ChartReformer: Natural Language-Driven Chart Image Editing [0.1712670816823812]
We propose ChartReformer, a natural language-driven chart image editing solution that directly edits the charts from the input images with the given instruction prompts. To generalize ChartReformer, we define and standardize various types of chart editing, covering style, layout, format, and data-centric edits.
arXiv Detail & Related papers (2024-03-01T00:59:50Z)
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning [54.89249749894061]
ChartAssistant is a vision-language model for universal chart comprehension and reasoning. It undergoes a two-stage training process, starting with pre-training on chart-to-table parsing to align chart and text. Experimental results demonstrate significant performance gains over the state-of-the-art UniChart and Chartllama method.
arXiv Detail & Related papers (2024-01-04T17:51:48Z)
ChartLlama: A Multimodal LLM for Chart Understanding and Generation [70.1393163657813]
We create a high-quality instruction-tuning dataset leveraging GPT-4. Next, we introduce ChartLlama, a multi-modal large language model that we've trained using our created dataset.
arXiv Detail & Related papers (2023-11-27T15:20:23Z)
StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding [54.45681512355684]
Current chart-related tasks focus on either chart perception that extracts information from the visual charts, or chart reasoning given the extracted data. We introduce StructChart, a novel framework that leverages Structured Triplet Representations (STR) to achieve a unified and label-efficient approach.
arXiv Detail & Related papers (2023-09-20T12:51:13Z)
ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules [89.75395046894809]
We present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks. Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks. Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model.
arXiv Detail & Related papers (2023-04-05T00:25:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.