Predicting Visual Importance Across Graphic Design Types
- URL: http://arxiv.org/abs/2008.02912v1
- Date: Fri, 7 Aug 2020 00:12:18 GMT
- Title: Predicting Visual Importance Across Graphic Design Types
- Authors: Camilo Fosco, Vincent Casser, Amish Kumar Bedi, Peter O'Donovan, Aaron
Hertzmann, Zoya Bylinskii
- Abstract summary: This paper introduces a Unified Model of Saliency and Importance (UMSI)
UMSI learns to predict visual importance in input graphic designs, and saliency in natural images.
We also introduce Imp1k, a new dataset of designs annotated with importance information.
- Score: 22.171824732227872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces a Unified Model of Saliency and Importance (UMSI),
which learns to predict visual importance in input graphic designs, and
saliency in natural images, along with a new dataset and applications. Previous
methods for predicting saliency or visual importance are trained individually
on specialized datasets, making them limited in application and leading to poor
generalization on novel image classes, while requiring a user to know which
model to apply to which input. UMSI is a deep learning-based model
simultaneously trained on images from different design classes, including
posters, infographics, mobile UIs, as well as natural images, and includes an
automatic classification module to classify the input. This allows the model to
work more effectively without requiring a user to label the input. We also
introduce Imp1k, a new dataset of designs annotated with importance
information. We demonstrate two new design interfaces that use importance
prediction, including a tool for adjusting the relative importance of design
elements, and a tool for reflowing designs to new aspect ratios while
preserving visual importance. The model, code, and importance dataset are
available at https://predimportance.mit.edu .
Related papers
- Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Enhancing Large Vision Language Models with Self-Training on Image Comprehension [131.14381425260706]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension.
First, the model self-constructs a preference for image descriptions using unlabeled images.
To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z) - U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation [18.841473623776153]
State-of-the-art personalization models tend to overfit the whole subject and cannot disentangle visual characteristics in pixel space.
A novel decoupled self-augmentation strategy is proposed to generate target-related and non-target samples to learn user-specified visual attributes.
Experiments on various kinds of visual attributes with SOTA personalization methods show the ability of the proposed method to mimic target visual appearance in novel contexts.
arXiv Detail & Related papers (2024-03-29T15:20:34Z) - Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use [14.2527771630478]
We propose a new framework that alleviates manual effort by replacing human labeling with natural language interactions.
Our framework eliminates the need for crowd-sourced annotations.
Our trained models outperform traditional Agile Modeling as well as state-of-the-art zero-shot classification models.
arXiv Detail & Related papers (2024-03-05T03:34:11Z) - ScreenAI: A Vision-Language Model for UI and Infographics Understanding [4.914575630736291]
We introduce ScreenAI, a vision-language model that specializes in UI and infographics understanding.
At the heart of this mixture is a novel screen annotation task in which the model has to identify the type and location of UI elements.
We use these text annotations to describe screens to Large Language Models and automatically generate question-answering (QA), UI navigation, and summarization training datasets at scale.
arXiv Detail & Related papers (2024-02-07T06:42:33Z) - Sequential Modeling Enables Scalable Learning for Large Vision Models [120.91839619284431]
We introduce a novel sequential modeling approach which enables learning a Large Vision Model (LVM) without making use of any linguistic data.
We define a common format, "visual sentences", in which we can represent raw images and videos as well as annotated data sources.
arXiv Detail & Related papers (2023-12-01T18:59:57Z) - Rethinking Visual Prompt Learning as Masked Visual Token Modeling [106.71983630652323]
We propose Visual Prompt learning as masked visual Token Modeling (VPTM) to transform the downstream visual classification into the pre-trained masked visual token prediction.
VPTM is the first visual prompt method on the generative pre-trained visual model, which achieves consistency between pre-training and downstream visual classification by task reformulation.
arXiv Detail & Related papers (2023-03-09T02:43:10Z) - Robustar: Interactive Toolbox Supporting Precise Data Annotation for
Robust Vision Learning [53.900911121695536]
We introduce the initial release of our software Robustar.
It aims to improve the robustness of vision classification machine learning models through a data-driven perspective.
arXiv Detail & Related papers (2022-07-18T21:12:28Z) - Graph Few-shot Class-incremental Learning [25.94168397283495]
The ability to incrementally learn new classes is vital to all real-world artificial intelligence systems.
In this paper, we investigate the challenging yet practical problem, Graph Few-shot Class-incremental (Graph FCL) problem.
We put forward a Graph Pseudo Incremental Learning paradigm by sampling tasks recurrently from the base classes.
We present a task-sensitive regularizer calculated from task-level attention and node class prototypes to mitigate overfitting onto either novel or base classes.
arXiv Detail & Related papers (2021-12-23T19:46:07Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Reducing Overlearning through Disentangled Representations by
Suppressing Unknown Tasks [8.517620051440005]
Existing deep learning approaches for learning visual features tend to overlearn and extract more information than what is required for the task at hand.
From a privacy preservation perspective, the input visual information is not protected from the model.
We propose a model-agnostic solution for reducing model overlearning by suppressing all the unknown tasks.
arXiv Detail & Related papers (2020-05-20T17:31:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.