Related papers: DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning

DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning

URL: http://arxiv.org/abs/2310.12128v2
Date: Mon, 15 Jul 2024 16:32:39 GMT
Title: DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning
Authors: Abhay Zala, Han Lin, Jaemin Cho, Mohit Bansal,
Abstract summary: Text-to-image (T2I) generation has seen significant growth over the past few years. Despite this, there has been little work on generating diagrams with T2I models. We present DiagrammerGPT, a novel two-stage text-to-diagram generation framework. We show that our framework produces more accurate diagrams, outperforming existing T2I models.
Score: 62.51232333352754
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-image (T2I) generation has seen significant growth over the past few years. Despite this, there has been little work on generating diagrams with T2I models. A diagram is a symbolic/schematic representation that explains information using structurally rich and spatially complex visualizations (e.g., a dense combination of related objects, text labels, directional arrows/lines, etc.). Existing state-of-the-art T2I models often fail at diagram generation because they lack fine-grained object layout control when many objects are densely connected via complex relations such as arrows/lines, and also often fail to render comprehensible text labels. To address this gap, we present DiagrammerGPT, a novel two-stage text-to-diagram generation framework leveraging the layout guidance capabilities of LLMs to generate more accurate diagrams. In the first stage, we use LLMs to generate and iteratively refine 'diagram plans' (in a planner-auditor feedback loop). In the second stage, we use a diagram generator, DiagramGLIGEN, and a text label rendering module to generate diagrams (with clear text labels) following the diagram plans. To benchmark the text-to-diagram generation task, we introduce AI2D-Caption, a densely annotated diagram dataset built on top of the AI2D dataset. We show that our DiagrammerGPT framework produces more accurate diagrams, outperforming existing T2I models. We also provide comprehensive analysis, including open-domain diagram generation, multi-platform vector graphic diagram generation, human-in-the-loop editing, and multimodal planner/auditor LLMs.

Related papers

Enhancing Chart-to-Code Generation in Multimodal Large Language Models via Iterative Dual Preference Learning [16.22363384653305]
We introduce Chart2Code, a novel iterative dual preference learning framework for chart-to-code generation. We find that Chart2Code consistently improves out-of-distribution chart-to-code generation quality. Our framework paves the way for future advancements in chart comprehension.
arXiv Detail & Related papers (2025-04-03T07:51:20Z)
LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models [54.82915844507371]
Text-Attributed Graphs (TAGs) are ubiquitous in real-world scenarios. Despite large efforts to integrate Large Language Models (LLMs) and Graph Neural Networks (GNNs) for TAGs, existing approaches suffer from decoupled architectures. We propose PromptGFM, a versatile GFM for TAGs grounded in graph vocabulary learning.
arXiv Detail & Related papers (2025-03-05T09:45:22Z)
Plan-over-Graph: Towards Parallelable LLM Agent Schedule [53.834646147919436]
Large Language Models (LLMs) have demonstrated exceptional abilities in reasoning for task planning. This paper introduces a novel paradigm, plan-over-graph, in which the model first decomposes a real-life textual task into executable subtasks and constructs an abstract task graph. The model then understands this task graph as input and generates a plan for parallel execution.
arXiv Detail & Related papers (2025-02-20T13:47:51Z)
A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs) [5.37125692728042]
SceneGuided RetrieveRwR is a framework for reasoning and planning with graphs. We show that our framework surpasses existing LLM-based approaches in numerical Q&A and planning tasks.
arXiv Detail & Related papers (2025-02-05T18:50:38Z)
ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation [90.82566869965011]
textbfChartCoder is the first dedicated chart-to-code MLLM. We introduce textbfChart2Code-160k, the first large-scale and diverse dataset for chart-to-code generation. Experiments demonstrate that ChartCoder, with only 7B parameters, surpasses existing open-source MLLMs on chart-to-code benchmarks.
arXiv Detail & Related papers (2025-01-11T17:52:22Z)
InstructG2I: Synthesizing Images from Multimodal Attributed Graphs [50.852150521561676]
We propose a graph context-conditioned diffusion model called InstructG2I. InstructG2I first exploits the graph structure and multimodal information to conduct informative neighbor sampling. A Graph-QFormer encoder adaptively encodes the graph nodes into an auxiliary set of graph prompts to guide the denoising process.
arXiv Detail & Related papers (2024-10-09T17:56:15Z)
Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback [37.275533538711436]
We propose a hierarchical pipeline and a new dataset for chart generation. Our dataset, Text2Chart31, includes 31 unique plot types referring to the Matplotlib library. We introduce a reinforcement learning-based instruction tuning technique for chart generation tasks without requiring human feedback.
arXiv Detail & Related papers (2024-10-05T07:25:56Z)
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning [83.58521787193293]
We present TinyChart, an efficient MLLM for chart understanding with only 3B parameters. TinyChart overcomes two key challenges in efficient chart understanding: (1) reduce the burden of learning numerical computations through a Program-of-Thoughts (PoT) learning strategy, and (2) reduce lengthy vision feature sequences produced by the vision transformer for high-resolution images through a Vision Token Merging module.
arXiv Detail & Related papers (2024-04-25T14:23:24Z)
Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs [60.71360240206726]
Large language models (LLMs) suffer from hallucinations, especially on knowledge-intensive tasks. Existing works propose to augment LLMs with individual text units retrieved from external knowledge corpora. We propose a framework called Graph Chain-of-thought (Graph-CoT) to augment LLMs with graphs by encouraging LLMs to reason on the graph iteratively.
arXiv Detail & Related papers (2024-04-10T15:41:53Z)
LLaGA: Large Language and Graph Assistant [73.71990472543027]
Large Language and Graph Assistant (LLaGA) is an innovative model to handle the complexities of graph-structured data. LLaGA excels in versatility, generalizability and interpretability, allowing it to perform consistently well across different datasets and tasks. Our experiments show that LLaGA delivers outstanding performance across four datasets and three tasks using one single model.
arXiv Detail & Related papers (2024-02-13T02:03:26Z)
GraphGPT: Graph Instruction Tuning for Large Language Models [27.036935149004726]
Graph Neural Networks (GNNs) have evolved to understand graph structures. To enhance robustness, self-supervised learning (SSL) has become a vital tool for data augmentation. Our research tackles this by advancing graph model generalization in zero-shot learning environments.
arXiv Detail & Related papers (2023-10-19T06:17:46Z)
ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules [89.75395046894809]
We present ChartReader, a unified framework that seamlessly integrates chart derendering and comprehension tasks. Our approach includes a transformer-based chart component detection module and an extended pre-trained vision-language model for chart-to-X tasks. Our proposed framework can significantly reduce the manual effort involved in chart analysis, providing a step towards a universal chart understanding model.
arXiv Detail & Related papers (2023-04-05T00:25:27Z)
INFINITY: A Simple Yet Effective Unsupervised Framework for Graph-Text Mutual Conversion [43.70416280548082]
Graph-to-text (G2T) generation and text-to-graph (T2G) triple extraction are essential tasks for constructing and applying knowledge graphs. Existing unsupervised approaches turn out to be suitable candidates for jointly learning the two tasks due to their avoidance of using graph-text parallel data. We propose INFINITY, a simple yet effective unsupervised approach that does not require external annotation tools or additional parallel information.
arXiv Detail & Related papers (2022-09-22T03:12:43Z)
JointGT: Graph-Text Joint Representation Learning for Text Generation from Knowledge Graphs [44.06715423776722]
We propose a graph-text joint representation learning model called JointGT. During encoding, we devise a structure-aware semantic aggregation module which is plugged into each Transformer layer. We show that JointGT obtains new state-of-the-art performance on various KG-to-text datasets.
arXiv Detail & Related papers (2021-06-19T14:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.