PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents
- URL: http://arxiv.org/abs/2501.11233v1
- Date: Mon, 20 Jan 2025 02:31:52 GMT
- Title: PlotEdit: Natural Language-Driven Accessible Chart Editing in PDFs via Multimodal LLM Agents
- Authors: Kanika Goswami, Puneet Mathur, Ryan Rossi, Franck Dernoncourt,
- Abstract summary: PlotEdit is a novel multi-agent framework for natural language-driven end-to-end chart image editing.
PlotEdit orchestrates five LLM agents: Chart2Table for data table extraction, Chart2Vision for style identification, Chart2Code for retrieving rendering code, Instruction Decomposition Agent for parsing user requests into executable steps, and Multimodal Editing Agent for implementing nuanced chart component modifications.
PlotEdit outperforms existing baselines on the ChartCraft dataset across style, layout, format, and data-centric edits.
- Score: 47.79080056618323
- License:
- Abstract: Chart visualizations, while essential for data interpretation and communication, are predominantly accessible only as images in PDFs, lacking source data tables and stylistic information. To enable effective editing of charts in PDFs or digital scans, we present PlotEdit, a novel multi-agent framework for natural language-driven end-to-end chart image editing via self-reflective LLM agents. PlotEdit orchestrates five LLM agents: (1) Chart2Table for data table extraction, (2) Chart2Vision for style attribute identification, (3) Chart2Code for retrieving rendering code, (4) Instruction Decomposition Agent for parsing user requests into executable steps, and (5) Multimodal Editing Agent for implementing nuanced chart component modifications - all coordinated through multimodal feedback to maintain visual fidelity. PlotEdit outperforms existing baselines on the ChartCraft dataset across style, layout, format, and data-centric edits, enhancing accessibility for visually challenged users and improving novice productivity.
Related papers
- ChartAdapter: Large Vision-Language Model for Chart Summarization [13.499376163294816]
ChartAdapter is a lightweight transformer module designed to bridge the gap between charts and textual summaries.
By integrating ChartAdapter with an LLM, we enable end-to-end training and efficient chart summarization.
arXiv Detail & Related papers (2024-12-30T05:07:34Z) - Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback [37.275533538711436]
We propose a hierarchical pipeline and a new dataset for chart generation.
Our dataset, Text2Chart31, includes 31 unique plot types referring to the Matplotlib library.
We introduce a reinforcement learning-based instruction tuning technique for chart generation tasks without requiring human feedback.
arXiv Detail & Related papers (2024-10-05T07:25:56Z) - ChartFormer: A Large Vision Language Model for Converting Chart Images into Tactile Accessible SVGs [20.690529354141116]
We leverage advancements in the field of chart analysis to generate tactile charts in an end-to-end manner.
Our three key contributions are as follows: (1) the ChartFormer model trained to convert chart images into tactile-accessible SVGs, (2) training this model on the Chart2Tactile dataset, and (3) evaluating the effectiveness of our SVGs through a pilot user study with a refreshable two-dimensional tactile display.
arXiv Detail & Related papers (2024-05-29T14:24:42Z) - SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing [53.00272278754867]
SEED-Data-Edit is a hybrid dataset for instruction-guided image editing.
High-quality editing data produced by an automated pipeline.
Real-world scenario data collected from the internet.
High-precision multi-turn editing data annotated by humans.
arXiv Detail & Related papers (2024-05-07T04:55:47Z) - ChartReformer: Natural Language-Driven Chart Image Editing [0.1712670816823812]
We propose ChartReformer, a natural language-driven chart image editing solution that directly edits the charts from the input images with the given instruction prompts.
To generalize ChartReformer, we define and standardize various types of chart editing, covering style, layout, format, and data-centric edits.
arXiv Detail & Related papers (2024-03-01T00:59:50Z) - ChartLlama: A Multimodal LLM for Chart Understanding and Generation [70.1393163657813]
We create a high-quality instruction-tuning dataset leveraging GPT-4.
Next, we introduce ChartLlama, a multi-modal large language model that we've trained using our created dataset.
arXiv Detail & Related papers (2023-11-27T15:20:23Z) - StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding [54.45681512355684]
Current chart-related tasks focus on either chart perception that extracts information from the visual charts, or chart reasoning given the extracted data.
We introduce StructChart, a novel framework that leverages Structured Triplet Representations (STR) to achieve a unified and label-efficient approach.
arXiv Detail & Related papers (2023-09-20T12:51:13Z) - ChartReader: A Unified Framework for Chart Derendering and Comprehension
without Heuristic Rules [89.75395046894809]
We present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks.
Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks.
Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model.
arXiv Detail & Related papers (2023-04-05T00:25:27Z) - Graph Edit Distance Reward: Learning to Edit Scene Graph [69.39048809061714]
We propose a new method to edit the scene graph according to the user instructions, which has never been explored.
To be specific, in order to learn editing scene graphs as the semantics given by texts, we propose a Graph Edit Distance Reward.
In the context of text-editing image retrieval, we validate the effectiveness of our method in CSS and CRIR dataset.
arXiv Detail & Related papers (2020-08-15T04:52:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.