DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning
- URL: http://arxiv.org/abs/2310.12128v2
- Date: Mon, 15 Jul 2024 16:32:39 GMT
- Title: DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning
- Authors: Abhay Zala, Han Lin, Jaemin Cho, Mohit Bansal,
- Abstract summary: Text-to-image (T2I) generation has seen significant growth over the past few years.
Despite this, there has been little work on generating diagrams with T2I models.
We present DiagrammerGPT, a novel two-stage text-to-diagram generation framework.
We show that our framework produces more accurate diagrams, outperforming existing T2I models.
- Score: 62.51232333352754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image (T2I) generation has seen significant growth over the past few years. Despite this, there has been little work on generating diagrams with T2I models. A diagram is a symbolic/schematic representation that explains information using structurally rich and spatially complex visualizations (e.g., a dense combination of related objects, text labels, directional arrows/lines, etc.). Existing state-of-the-art T2I models often fail at diagram generation because they lack fine-grained object layout control when many objects are densely connected via complex relations such as arrows/lines, and also often fail to render comprehensible text labels. To address this gap, we present DiagrammerGPT, a novel two-stage text-to-diagram generation framework leveraging the layout guidance capabilities of LLMs to generate more accurate diagrams. In the first stage, we use LLMs to generate and iteratively refine 'diagram plans' (in a planner-auditor feedback loop). In the second stage, we use a diagram generator, DiagramGLIGEN, and a text label rendering module to generate diagrams (with clear text labels) following the diagram plans. To benchmark the text-to-diagram generation task, we introduce AI2D-Caption, a densely annotated diagram dataset built on top of the AI2D dataset. We show that our DiagrammerGPT framework produces more accurate diagrams, outperforming existing T2I models. We also provide comprehensive analysis, including open-domain diagram generation, multi-platform vector graphic diagram generation, human-in-the-loop editing, and multimodal planner/auditor LLMs.
Related papers
- Plan-over-Graph: Towards Parallelable LLM Agent Schedule [53.834646147919436]
Large Language Models (LLMs) have demonstrated exceptional abilities in reasoning for task planning.
This paper introduces a novel paradigm, plan-over-graph, in which the model first decomposes a real-life textual task into executable subtasks and constructs an abstract task graph.
The model then understands this task graph as input and generates a plan for parallel execution.
arXiv Detail & Related papers (2025-02-20T13:47:51Z) - GraphiT: Efficient Node Classification on Text-Attributed Graphs with Prompt Optimized LLMs [0.0]
GraphiT (Graphs in Text) is a framework for encoding graphs into a textual format.
We show how GraphiT leads to measurably better results without prompt tweaking.
arXiv Detail & Related papers (2025-02-14T19:38:41Z) - A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs) [5.37125692728042]
SceneGuided RetrieveRwR is a framework for reasoning and planning with graphs.
We show that our framework surpasses existing LLM-based approaches in numerical Q&A and planning tasks.
arXiv Detail & Related papers (2025-02-05T18:50:38Z) - ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation [90.82566869965011]
textbfChartCoder is the first dedicated chart-to-code MLLM.
We introduce textbfChart2Code-160k, the first large-scale and diverse dataset for chart-to-code generation.
Experiments demonstrate that ChartCoder, with only 7B parameters, surpasses existing open-source MLLMs on chart-to-code benchmarks.
arXiv Detail & Related papers (2025-01-11T17:52:22Z) - TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning [83.58521787193293]
We present TinyChart, an efficient MLLM for chart understanding with only 3B parameters.
TinyChart overcomes two key challenges in efficient chart understanding: (1) reduce the burden of learning numerical computations through a Program-of-Thoughts (PoT) learning strategy, and (2) reduce lengthy vision feature sequences produced by the vision transformer for high-resolution images through a Vision Token Merging module.
arXiv Detail & Related papers (2024-04-25T14:23:24Z) - Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs [60.71360240206726]
Large language models (LLMs) suffer from hallucinations, especially on knowledge-intensive tasks.
Existing works propose to augment LLMs with individual text units retrieved from external knowledge corpora.
We propose a framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively.
arXiv Detail & Related papers (2024-04-10T15:41:53Z) - LLaGA: Large Language and Graph Assistant [73.71990472543027]
Large Language and Graph Assistant (LLaGA) is an innovative model to handle the complexities of graph-structured data.
LLaGA excels in versatility, generalizability and interpretability, allowing it to perform consistently well across different datasets and tasks.
Our experiments show that LLaGA delivers outstanding performance across four datasets and three tasks using one single model.
arXiv Detail & Related papers (2024-02-13T02:03:26Z) - GraphGPT: Graph Instruction Tuning for Large Language Models [27.036935149004726]
Graph Neural Networks (GNNs) have evolved to understand graph structures.
To enhance robustness, self-supervised learning (SSL) has become a vital tool for data augmentation.
Our research tackles this by advancing graph model generalization in zero-shot learning environments.
arXiv Detail & Related papers (2023-10-19T06:17:46Z) - INFINITY: A Simple Yet Effective Unsupervised Framework for Graph-Text
Mutual Conversion [43.70416280548082]
Graph-to-text (G2T) generation and text-to-graph (T2G) triple extraction are essential tasks for constructing and applying knowledge graphs.
Existing unsupervised approaches turn out to be suitable candidates for jointly learning the two tasks due to their avoidance of using graph-text parallel data.
We propose INFINITY, a simple yet effective unsupervised approach that does not require external annotation tools or additional parallel information.
arXiv Detail & Related papers (2022-09-22T03:12:43Z) - JointGT: Graph-Text Joint Representation Learning for Text Generation
from Knowledge Graphs [44.06715423776722]
We propose a graph-text joint representation learning model called JointGT.
During encoding, we devise a structure-aware semantic aggregation module which is plugged into each Transformer layer.
We show that JointGT obtains new state-of-the-art performance on various KG-to-text datasets.
arXiv Detail & Related papers (2021-06-19T14:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.