Related papers: AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ

AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ

URL: http://arxiv.org/abs/2310.00367v2
Date: Tue, 23 Jan 2024 15:20:33 GMT
Title: AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ
Authors: Jonas Belouadi, Anne Lauscher, Steffen Eger
Abstract summary: We introduce DaTikZ, the first large-scale TikZ dataset consisting of 120k TikZ drawings aligned with captions. We fine-tune LLaMA on DaTikZ, as well as our new model CLiMA, which augments LLaMA with multimodal CLIP embeddings. In both human and automatic evaluation, CLiMA and LLaMA outperform commercial GPT-4 and Claude 2 in terms of similarity to human-created figures.
Score: 38.2820447703639
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generating bitmap graphics from text has gained considerable attention, yet for scientific figures, vector graphics are often preferred. Given that vector graphics are typically encoded using low-level graphics primitives, generating them directly is difficult. To address this, we propose the use of TikZ, a well-known abstract graphics language that can be compiled to vector graphics, as an intermediate representation of scientific figures. TikZ offers human-oriented, high-level commands, thereby facilitating conditional language modeling with any large language model. To this end, we introduce DaTikZ, the first large-scale TikZ dataset consisting of 120k TikZ drawings aligned with captions. We fine-tune LLaMA on DaTikZ, as well as our new model CLiMA, which augments LLaMA with multimodal CLIP embeddings. In both human and automatic evaluation, CLiMA and LLaMA outperform commercial GPT-4 and Claude 2 in terms of similarity to human-created figures, with CLiMA additionally improving text-image alignment. Our detailed analysis shows that all models generalize well and are not susceptible to memorization. GPT-4 and Claude 2, however, tend to generate more simplistic figures compared to both humans and our models. We make our framework, AutomaTikZ, along with model weights and datasets, publicly available.

Related papers

TikZero: Zero-Shot Text-Guided Graphics Program Synthesis [56.35987342339608]
We present TikZero, which synthesizes graphics program generation from text understanding by using image representations as an intermediary bridge. It enables independent training on graphics programs and captioned images and allows for zero-shot text-guided graphics program synthesis. We show that our method substantially outperforms baselines that can only operate with caption-aligned graphics programs.
arXiv Detail & Related papers (2025-03-14T15:29:58Z)
DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ [32.12690388609568]
DeTikZify is a novel language model that automatically synthesizes scientific figures as semantics-preserving TikZ graphics programs. We create three new datasets: DaTikZv2, SketchFig, and MetaFig. We train DeTikZify on MetaFig and DaTikZv2, along with synthetically generated sketches learned from SketchFig.
arXiv Detail & Related papers (2024-05-24T07:48:35Z)
Large Language Models on Graphs: A Comprehensive Survey [77.16803297418201]
We provide a systematic review of scenarios and techniques related to large language models on graphs. We first summarize potential scenarios of adopting LLMs on graphs into three categories, namely pure graphs, text-attributed graphs, and text-paired graphs. We discuss the real-world applications of such methods and summarize open-source codes and benchmark datasets.
arXiv Detail & Related papers (2023-12-05T14:14:27Z)
Which Modality should I use -- Text, Motif, or Image? : Understanding Graphs with Large Language Models [14.251972223585765]
This paper introduces a new approach to encoding a graph with diverse modalities, such as text, image, and motif, and prompts to approximate a graph's global connectivity. The study also presents GraphTMI, a novel benchmark for evaluating Large Language Models (LLMs) in graph structure analysis.
arXiv Detail & Related papers (2023-11-16T12:45:41Z)
DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning [62.51232333352754]
Text-to-image (T2I) generation has seen significant growth over the past few years. Despite this, there has been little work on generating diagrams with T2I models. We present DiagrammerGPT, a novel two-stage text-to-diagram generation framework. We show that our framework produces more accurate diagrams, outperforming existing T2I models.
arXiv Detail & Related papers (2023-10-18T17:37:10Z)
Generating Faithful Text From a Knowledge Graph with Noisy Reference Text [26.6775578332187]
We develop a KG-to-text generation model that can generate faithful natural-language text from a given graph. Our framework incorporates two core ideas: Firstly, we utilize contrastive learning to enhance the model's ability to differentiate between faithful and hallucinated information in the text. Secondly, we empower the decoder to control the level of hallucination in the generated text by employing a controllable text generation technique.
arXiv Detail & Related papers (2023-08-12T07:12:45Z)
Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization [108.09419317477986]
Z-Code++ is a new pre-trained language model optimized for abstractive text summarization. The model is first pre-trained using text corpora for language understanding, and then is continually pre-trained on summarization corpora for grounded text generation. Our model is parameter-efficient in that it outperforms the 600x larger PaLM-540B on XSum, and the finetuned 200x larger GPT3-175B on SAMSum.
arXiv Detail & Related papers (2022-08-21T01:00:54Z)
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding [53.170767750244366]
Imagen is a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding. To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models.
arXiv Detail & Related papers (2022-05-23T17:42:53Z)
Font Completion and Manipulation by Cycling Between Multi-Modality Representations [113.26243126754704]
We innovate to explore the generation of font glyphs as 2D graphic objects with the graph as an intermediate representation. We formulate a cross-modality cycled image-to-image structure with a graph between an image encoder and an image. Our model generates improved results than both image-to-image baseline and previous state-of-the-art methods for glyph completion.
arXiv Detail & Related papers (2021-08-30T02:43:29Z)
R2D2: Relational Text Decoding with Transformers [18.137828323277347]
We propose a novel framework for modeling the interaction between graphical structures and the natural language text associated with their nodes and edges. Our proposed method utilizes both the graphical structure as well as the sequential nature of the texts. While the proposed model has wide applications, we demonstrate its capabilities on data-to-text generation tasks.
arXiv Detail & Related papers (2021-05-10T19:59:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.