Let Me Choose: From Verbal Context to Font Selection
- URL: http://arxiv.org/abs/2005.01151v1
- Date: Sun, 3 May 2020 17:36:17 GMT
- Title: Let Me Choose: From Verbal Context to Font Selection
- Authors: Amirreza Shirani, Franck Dernoncourt, Jose Echevarria, Paul Asente,
Nedim Lipka and Thamar Solorio
- Abstract summary: We aim to learn associations between visual attributes of fonts and the verbal context of the texts they are typically applied to.
We introduce a new dataset, containing examples of different topics in social media posts and ads, labeled through crowd-sourcing.
- Score: 50.293897197235296
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we aim to learn associations between visual attributes of
fonts and the verbal context of the texts they are typically applied to.
Compared to related work leveraging the surrounding visual context, we choose
to focus only on the input text as this can enable new applications for which
the text is the only visual element in the document. We introduce a new
dataset, containing examples of different topics in social media posts and ads,
labeled through crowd-sourcing. Due to the subjective nature of the task,
multiple fonts might be perceived as acceptable for an input text, which makes
this problem challenging. To this end, we investigate different end-to-end
models to learn label distributions on crowd-sourced data and capture
inter-subjectivity across all annotations.
Related papers
- Leveraging Open-Vocabulary Diffusion to Camouflaged Instance
Segmentation [59.78520153338878]
Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions.
We propose a method built upon a state-of-the-art diffusion model, empowered by open-vocabulary to learn multi-scale textual-visual features for camouflaged object representations.
arXiv Detail & Related papers (2023-12-29T07:59:07Z) - Focus! Relevant and Sufficient Context Selection for News Image
Captioning [69.36678144800936]
News Image Captioning requires describing an image by leveraging additional context from a news article.
We propose to use the pre-trained vision and language retrieval model CLIP to localize the visually grounded entities in the news article.
Our experiments demonstrate that by simply selecting a better context from the article, we can significantly improve the performance of existing models.
arXiv Detail & Related papers (2022-12-01T20:00:27Z) - Contrastive Graph Multimodal Model for Text Classification in Videos [9.218562155255233]
We are the first to address this new task of video text classification by fusing multimodal information.
We tailor a specific module called CorrelationNet to reinforce feature representation by explicitly extracting layout information.
We construct a new well-defined industrial dataset from the news domain, called TI-News, which is dedicated to building and evaluating video text recognition and classification applications.
arXiv Detail & Related papers (2022-06-06T04:06:21Z) - TextStyleBrush: Transfer of Text Aesthetics from a Single Example [16.29689649632619]
We present a novel approach for disentangling the content of a text image from all aspects of its appearance.
We learn this disentanglement in a self-supervised manner.
We show results in different text domains which were previously handled by specialized methods.
arXiv Detail & Related papers (2021-06-15T19:28:49Z) - Learning Object Detection from Captions via Textual Scene Attributes [70.90708863394902]
We argue that captions contain much richer information about the image, including attributes of objects and their relations.
We present a method that uses the attributes in this "textual scene graph" to train object detectors.
We empirically demonstrate that the resulting model achieves state-of-the-art results on several challenging object detection datasets.
arXiv Detail & Related papers (2020-09-30T10:59:20Z) - Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image
Classification and Retrieval [8.317191999275536]
This paper focuses on leveraging multi-modal content in the form of visual and textual cues to tackle the task of fine-grained image classification and retrieval.
We employ a Graph Convolutional Network to perform multi-modal reasoning and obtain relationship-enhanced features by learning a common semantic space between salient objects and text found in an image.
arXiv Detail & Related papers (2020-09-21T12:31:42Z) - Adaptive Text Recognition through Visual Matching [86.40870804449737]
We introduce a new model that exploits the repetitive nature of characters in languages.
By doing this, we turn text recognition into a shape matching problem.
We show that it can handle challenges that traditional architectures are not able to solve without expensive retraining.
arXiv Detail & Related papers (2020-09-14T17:48:53Z) - One-shot Text Field Labeling using Attention and Belief Propagation for
Structure Information Extraction [28.687815600404264]
We propose a novel deep end-to-end trainable approach for one-shot text field labeling.
To alleviate these problems, we proposed a novel deep end-to-end trainable approach for one-shot text field labeling.
We collected and annotated a real-world one-shot field labeling dataset.
arXiv Detail & Related papers (2020-09-09T08:11:34Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.