DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ
- URL: http://arxiv.org/abs/2405.15306v3
- Date: Wed, 06 Nov 2024 09:49:31 GMT
- Title: DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ
- Authors: Jonas Belouadi, Simone Paolo Ponzetto, Steffen Eger,
- Abstract summary: DeTikZify is a novel language model that automatically synthesizes scientific figures as semantics-preserving TikZ graphics programs.
We create three new datasets: DaTikZv2, SketchFig, and MetaFig.
We train DeTikZify on MetaFig and DaTikZv2, along with synthetically generated sketches learned from SketchFig.
- Score: 32.12690388609568
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Creating high-quality scientific figures can be time-consuming and challenging, even though sketching ideas on paper is relatively easy. Furthermore, recreating existing figures that are not stored in formats preserving semantic information is equally complex. To tackle this problem, we introduce DeTikZify, a novel multimodal language model that automatically synthesizes scientific figures as semantics-preserving TikZ graphics programs based on sketches and existing figures. To achieve this, we create three new datasets: DaTikZv2, the largest TikZ dataset to date, containing over 360k human-created TikZ graphics; SketchFig, a dataset that pairs hand-drawn sketches with their corresponding scientific figures; and MetaFig, a collection of diverse scientific figures and associated metadata. We train DeTikZify on MetaFig and DaTikZv2, along with synthetically generated sketches learned from SketchFig. We also introduce an MCTS-based inference algorithm that enables DeTikZify to iteratively refine its outputs without the need for additional training. Through both automatic and human evaluation, we demonstrate that DeTikZify outperforms commercial Claude 3 and GPT-4V in synthesizing TikZ programs, with the MCTS algorithm effectively boosting its performance. We make our code, models, and datasets publicly available.
Related papers
- TikZero: Zero-Shot Text-Guided Graphics Program Synthesis [56.35987342339608]
We present TikZero, which synthesizes graphics program generation from text understanding by using image representations as an intermediary bridge.
It enables independent training on graphics programs and captioned images and allows for zero-shot text-guided graphics program synthesis.
We show that our method substantially outperforms baselines that can only operate with caption-aligned graphics programs.
arXiv Detail & Related papers (2025-03-14T15:29:58Z) - TD3: Tucker Decomposition Based Dataset Distillation Method for Sequential Recommendation [50.23504065567638]
This paper introduces textbfTD3, a novel textbfDataset textbfDistillation method within a meta-learning framework.
TD3 distills a fully expressive emphsynthetic sequence summary from original data.
An augmentation technique allows the learner to closely fit the synthetic summary, ensuring an accurate update of it in the emphouter-loop.
arXiv Detail & Related papers (2025-02-05T03:13:25Z) - Multi-Style Facial Sketch Synthesis through Masked Generative Modeling [17.313050611750413]
We propose a lightweight end-to-end synthesis model that efficiently converts images to corresponding multi-stylized sketches.
In this study, we overcome the issue of data insufficiency by incorporating semi-supervised learning into the training process.
Our method consistently outperforms previous algorithms across multiple benchmarks.
arXiv Detail & Related papers (2024-08-22T13:45:04Z) - Towards Effective and Efficient Continual Pre-training of Large Language Models [163.34610964970258]
Continual pre-training (CPT) has been an important approach for adapting language models to specific domains or tasks.
This paper presents a technical report for continually pre-training Llama-3 (8B)
It significantly enhances the Chinese language ability and scientific reasoning ability of the backbone model.
arXiv Detail & Related papers (2024-07-26T13:55:21Z) - SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet Generation [6.39528707908268]
There continues to be a lack of large-scale paired datasets for scene sketches.
We propose a self-supervised method for scene sketch generation that does not rely on any existing scene sketch.
We contribute a large-scale dataset centered around scene sketches, comprising highly semantically consistent "text-sketch-image" triplets.
arXiv Detail & Related papers (2024-05-29T06:43:49Z) - SketchGPT: Autoregressive Modeling for Sketch Generation and Recognition [4.6519578789100215]
SketchGPT is a flexible framework that employs a sequence-to-sequence autoregressive model for sketch generation, and completion.
By mapping complex sketches into simplified sequences of abstract primitives, our approach significantly streamlines the input for autoregressive modeling.
arXiv Detail & Related papers (2024-05-06T01:24:14Z) - MathWriting: A Dataset For Handwritten Mathematical Expression Recognition [0.9012198585960439]
MathWriting is the largest online handwritten mathematical expression dataset to date.
One MathWriting sample consists of a formula written on a touch screen and a corresponding expression.
This dataset can also be used in its rendered form for offline HME recognition.
arXiv Detail & Related papers (2024-04-16T16:10:23Z) - Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation [55.73399465968594]
This paper proposes a novel generation paradigm Sketch3D to generate realistic 3D assets with shape aligned with the input sketch and color matching the textual description.
Three strategies are designed to optimize 3D Gaussians, i.e., structural optimization via a distribution transfer mechanism, color optimization with a straightforward MSE loss and sketch similarity optimization with a CLIP-based geometric similarity loss.
arXiv Detail & Related papers (2024-04-02T11:03:24Z) - Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes [118.406721663244]
We introduce a novel part-level modelling and alignment framework that facilitates abstraction modelling and cross-modal correspondence.
Our approach seamlessly extends to sketch modelling by establishing correspondence between CLIPasso edgemaps and projected 3D part regions.
arXiv Detail & Related papers (2023-12-07T05:04:33Z) - AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with
TikZ [38.2820447703639]
We introduce DaTikZ, the first large-scale TikZ dataset consisting of 120k TikZ drawings aligned with captions.
We fine-tune LLaMA on DaTikZ, as well as our new model CLiMA, which augments LLaMA with multimodal CLIP embeddings.
In both human and automatic evaluation, CLiMA and LLaMA outperform commercial GPT-4 and Claude 2 in terms of similarity to human-created figures.
arXiv Detail & Related papers (2023-09-30T13:15:49Z) - SENS: Part-Aware Sketch-based Implicit Neural Shape Modeling [124.3266213819203]
We present SENS, a novel method for generating and editing 3D models from hand-drawn sketches.
S SENS analyzes the sketch and encodes its parts into ViT patch encoding.
S SENS supports refinement via part reconstruction, allowing for nuanced adjustments and artifact removal.
arXiv Detail & Related papers (2023-06-09T17:50:53Z) - DiffSketching: Sketch Control Image Synthesis with Diffusion Models [10.172753521953386]
Deep learning models for sketch-to-image synthesis need to overcome the distorted input sketch without visual details.
Our model matches sketches through the cross domain constraints, and uses a classifier to guide the image synthesis more accurately.
Our model can beat GAN-based method in terms of generation quality and human evaluation, and does not rely on massive sketch-image datasets.
arXiv Detail & Related papers (2023-05-30T07:59:23Z) - FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in
Context [112.07988211268612]
We advance sketch research to scenes with the first dataset of freehand scene sketches, FS-COCO.
Our dataset comprises 10,000 freehand scene vector sketches with per point space-time information by 100 non-expert individuals.
We study for the first time the problem of the fine-grained image retrieval from freehand scene sketches and sketch captions.
arXiv Detail & Related papers (2022-03-04T03:00:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.