Font Completion and Manipulation by Cycling Between Multi-Modality
Representations
- URL: http://arxiv.org/abs/2108.12965v1
- Date: Mon, 30 Aug 2021 02:43:29 GMT
- Title: Font Completion and Manipulation by Cycling Between Multi-Modality
Representations
- Authors: Ye Yuan, Wuyang Chen, Zhaowen Wang, Matthew Fisher, Zhifei Zhang,
Zhangyang Wang, Hailin Jin
- Abstract summary: We innovate to explore the generation of font glyphs as 2D graphic objects with the graph as an intermediate representation.
We formulate a cross-modality cycled image-to-image structure with a graph between an image encoder and an image.
Our model generates improved results than both image-to-image baseline and previous state-of-the-art methods for glyph completion.
- Score: 113.26243126754704
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Generating font glyphs of consistent style from one or a few reference
glyphs, i.e., font completion, is an important task in topographical design. As
the problem is more well-defined than general image style transfer tasks, thus
it has received interest from both vision and machine learning communities.
Existing approaches address this problem as a direct image-to-image translation
task. In this work, we innovate to explore the generation of font glyphs as 2D
graphic objects with the graph as an intermediate representation, so that more
intrinsic graphic properties of font styles can be captured. Specifically, we
formulate a cross-modality cycled image-to-image model structure with a graph
constructor between an image encoder and an image renderer. The novel graph
constructor maps a glyph's latent code to its graph representation that matches
expert knowledge, which is trained to help the translation task. Our model
generates improved results than both image-to-image baseline and previous
state-of-the-art methods for glyph completion. Furthermore, the graph
representation output by our model also provides an intuitive interface for
users to do local editing and manipulation. Our proposed cross-modality cycled
representation learning has the potential to be applied to other domains with
prior knowledge from different data modalities. Our code is available at
https://github.com/VITA-Group/Font_Completion_Graph.
Related papers
- InstructG2I: Synthesizing Images from Multimodal Attributed Graphs [50.852150521561676]
We propose a graph context-conditioned diffusion model called InstructG2I.
InstructG2I first exploits the graph structure and multimodal information to conduct informative neighbor sampling.
A Graph-QFormer encoder adaptively encodes the graph nodes into an auxiliary set of graph prompts to guide the denoising process.
arXiv Detail & Related papers (2024-10-09T17:56:15Z) - JoyType: A Robust Design for Multilingual Visual Text Creation [14.441897362967344]
We introduce a novel approach for multilingual visual text creation, named JoyType.
JoyType is designed to maintain the font style of text during the image generation process.
Our evaluations, based on both visual and accuracy metrics, demonstrate that JoyType significantly outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-09-26T04:23:17Z) - DualVector: Unsupervised Vector Font Synthesis with Dual-Part
Representation [43.64428946288288]
Current font synthesis methods fail to represent the shape concisely or require vector supervision during training.
We propose a novel dual-part representation for vector glyphs, where each glyph is modeled as a collection of closed "positive" and "negative" path pairs.
Our method, named Dual-of-Font-art, outperforms state-of-the-art methods for practical use.
arXiv Detail & Related papers (2023-05-17T08:18:06Z) - GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures
in Text-to-Image Generation [18.396131717250793]
We introduce GlyphDraw, a general learning framework aiming to endow image generation models with the capacity to generate images coherently embedded with text for any specific language.
Our method not only produces accurate language characters as in prompts, but also seamlessly blends the generated text into the background.
arXiv Detail & Related papers (2023-03-31T08:06:33Z) - DGFont++: Robust Deformable Generative Networks for Unsupervised Font
Generation [19.473023811252116]
We propose a robust deformable generative network for unsupervised font generation (abbreviated as DGFont++)
To distinguish different styles, we train our model with a multi-task discriminator, which ensures that each style can be discriminated independently.
Experiments demonstrate that our model is able to generate character images of higher quality than state-of-the-art methods.
arXiv Detail & Related papers (2022-12-30T14:35:10Z) - Diffusion-Based Scene Graph to Image Generation with Masked Contrastive
Pre-Training [112.94542676251133]
We propose to learn scene graph embeddings by directly optimizing their alignment with images.
Specifically, we pre-train an encoder to extract both global and local information from scene graphs.
The resulting method, called SGDiff, allows for the semantic manipulation of generated images by modifying scene graph nodes and connections.
arXiv Detail & Related papers (2022-11-21T01:11:19Z) - Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z) - Learning Implicit Glyph Shape Representation [6.413829791927052]
We present a novel implicit glyph shape representation, which glyphs as shape primitives enclosed quadratic curves, and naturally enables generating glyph images at arbitrary high resolutions.
Based on the proposed representation, we design a simple yet effective disentangled network for the challenging one-shot font style transfer problem.
arXiv Detail & Related papers (2021-06-16T06:42:55Z) - Structural Information Preserving for Graph-to-Text Generation [59.00642847499138]
The task of graph-to-text generation aims at producing sentences that preserve the meaning of input graphs.
We propose to tackle this problem by leveraging richer training signals that can guide our model for preserving input information.
Experiments on two benchmarks for graph-to-text generation show the effectiveness of our approach over a state-of-the-art baseline.
arXiv Detail & Related papers (2021-02-12T20:09:01Z) - Bridging Knowledge Graphs to Generate Scene Graphs [49.69377653925448]
We propose a novel graph-based neural network that iteratively propagates information between the two graphs, as well as within each of them.
Our Graph Bridging Network, GB-Net, successively infers edges and nodes, allowing to simultaneously exploit and refine the rich, heterogeneous structure of the interconnected scene and commonsense graphs.
arXiv Detail & Related papers (2020-01-07T23:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.