Related papers: Font Completion and Manipulation by Cycling Between Multi-Modality Representations

Font Completion and Manipulation by Cycling Between Multi-Modality Representations

URL: http://arxiv.org/abs/2108.12965v1
Date: Mon, 30 Aug 2021 02:43:29 GMT
Title: Font Completion and Manipulation by Cycling Between Multi-Modality Representations
Authors: Ye Yuan, Wuyang Chen, Zhaowen Wang, Matthew Fisher, Zhifei Zhang, Zhangyang Wang, Hailin Jin
Abstract summary: We innovate to explore the generation of font glyphs as 2D graphic objects with the graph as an intermediate representation. We formulate a cross-modality cycled image-to-image structure with a graph between an image encoder and an image. Our model generates improved results than both image-to-image baseline and previous state-of-the-art methods for glyph completion.
Score: 113.26243126754704
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Generating font glyphs of consistent style from one or a few reference glyphs, i.e., font completion, is an important task in topographical design. As the problem is more well-defined than general image style transfer tasks, thus it has received interest from both vision and machine learning communities. Existing approaches address this problem as a direct image-to-image translation task. In this work, we innovate to explore the generation of font glyphs as 2D graphic objects with the graph as an intermediate representation, so that more intrinsic graphic properties of font styles can be captured. Specifically, we formulate a cross-modality cycled image-to-image model structure with a graph constructor between an image encoder and an image renderer. The novel graph constructor maps a glyph's latent code to its graph representation that matches expert knowledge, which is trained to help the translation task. Our model generates improved results than both image-to-image baseline and previous state-of-the-art methods for glyph completion. Furthermore, the graph representation output by our model also provides an intuitive interface for users to do local editing and manipulation. Our proposed cross-modality cycled representation learning has the potential to be applied to other domains with prior knowledge from different data modalities. Our code is available at https://github.com/VITA-Group/Font_Completion_Graph.

Related papers

InstructG2I: Synthesizing Images from Multimodal Attributed Graphs [50.852150521561676]
We propose a graph context-conditioned diffusion model called InstructG2I. InstructG2I first exploits the graph structure and multimodal information to conduct informative neighbor sampling. A Graph-QFormer encoder adaptively encodes the graph nodes into an auxiliary set of graph prompts to guide the denoising process.
arXiv Detail & Related papers (2024-10-09T17:56:15Z)
JoyType: A Robust Design for Multilingual Visual Text Creation [14.441897362967344]
We introduce a novel approach for multilingual visual text creation, named JoyType. JoyType is designed to maintain the font style of text during the image generation process. Our evaluations, based on both visual and accuracy metrics, demonstrate that JoyType significantly outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-09-26T04:23:17Z)
DualVector: Unsupervised Vector Font Synthesis with Dual-Part Representation [43.64428946288288]
Current font synthesis methods fail to represent the shape concisely or require vector supervision during training. We propose a novel dual-part representation for vector glyphs, where each glyph is modeled as a collection of closed "positive" and "negative" path pairs. Our method, named Dual-of-Font-art, outperforms state-of-the-art methods for practical use.
arXiv Detail & Related papers (2023-05-17T08:18:06Z)
GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures in Text-to-Image Generation [18.396131717250793]
We introduce GlyphDraw, a general learning framework aiming to endow image generation models with the capacity to generate images coherently embedded with text for any specific language. Our method not only produces accurate language characters as in prompts, but also seamlessly blends the generated text into the background.
arXiv Detail & Related papers (2023-03-31T08:06:33Z)
DGFont++: Robust Deformable Generative Networks for Unsupervised Font Generation [19.473023811252116]
We propose a robust deformable generative network for unsupervised font generation (abbreviated as DGFont++) To distinguish different styles, we train our model with a multi-task discriminator, which ensures that each style can be discriminated independently. Experiments demonstrate that our model is able to generate character images of higher quality than state-of-the-art methods.
arXiv Detail & Related papers (2022-12-30T14:35:10Z)
Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training [112.94542676251133]
We propose to learn scene graph embeddings by directly optimizing their alignment with images. Specifically, we pre-train an encoder to extract both global and local information from scene graphs. The resulting method, called SGDiff, allows for the semantic manipulation of generated images by modifying scene graph nodes and connections.
arXiv Detail & Related papers (2022-11-21T01:11:19Z)
Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph. We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z)
Learning Implicit Glyph Shape Representation [6.413829791927052]
We present a novel implicit glyph shape representation, which glyphs as shape primitives enclosed quadratic curves, and naturally enables generating glyph images at arbitrary high resolutions. Based on the proposed representation, we design a simple yet effective disentangled network for the challenging one-shot font style transfer problem.
arXiv Detail & Related papers (2021-06-16T06:42:55Z)
Structural Information Preserving for Graph-to-Text Generation [59.00642847499138]
The task of graph-to-text generation aims at producing sentences that preserve the meaning of input graphs. We propose to tackle this problem by leveraging richer training signals that can guide our model for preserving input information. Experiments on two benchmarks for graph-to-text generation show the effectiveness of our approach over a state-of-the-art baseline.
arXiv Detail & Related papers (2021-02-12T20:09:01Z)
Bridging Knowledge Graphs to Generate Scene Graphs [49.69377653925448]
We propose a novel graph-based neural network that iteratively propagates information between the two graphs, as well as within each of them. Our Graph Bridging Network, GB-Net, successively infers edges and nodes, allowing to simultaneously exploit and refine the rich, heterogeneous structure of the interconnected scene and commonsense graphs.
arXiv Detail & Related papers (2020-01-07T23:35:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.