MOGAN: Morphologic-structure-aware Generative Learning from a Single
Image
- URL: http://arxiv.org/abs/2103.02997v1
- Date: Thu, 4 Mar 2021 12:45:23 GMT
- Title: MOGAN: Morphologic-structure-aware Generative Learning from a Single
Image
- Authors: Jinshu Chen, Qihui Xu, Qi Kang and MengChu Zhou
- Abstract summary: Recently proposed generative models complete training based on only one image.
We introduce a MOrphologic-structure-aware Generative Adversarial Network named MOGAN that produces random samples with diverse appearances.
Our approach focuses on internal features including the maintenance of rational structures and variation on appearance.
- Score: 59.59698650663925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In most interactive image generation tasks, given regions of interest (ROI)
by users, the generated results are expected to have adequate diversities in
appearance while maintaining correct and reasonable structures in original
images. Such tasks become more challenging if only limited data is available.
Recently proposed generative models complete training based on only one image.
They pay much attention to the monolithic feature of the sample while ignoring
the actual semantic information of different objects inside the sample. As a
result, for ROI-based generation tasks, they may produce inappropriate samples
with excessive randomicity and without maintaining the related objects' correct
structures. To address this issue, this work introduces a
MOrphologic-structure-aware Generative Adversarial Network named MOGAN that
produces random samples with diverse appearances and reliable structures based
on only one image. For training for ROI, we propose to utilize the data coming
from the original image being augmented and bring in a novel module to
transform such augmented data into knowledge containing both structures and
appearances, thus enhancing the model's comprehension of the sample. To learn
the rest areas other than ROI, we employ binary masks to ensure the generation
isolated from ROI. Finally, we set parallel and hierarchical branches of the
mentioned learning process. Compared with other single image GAN schemes, our
approach focuses on internal features including the maintenance of rational
structures and variation on appearance. Experiments confirm a better capacity
of our model on ROI-based image generation tasks than its competitive peers.
Related papers
- RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model [22.56227565913003]
We propose a comprehensive remote sensing image building model, termed RSBuilding, developed from the perspective of the foundation model.
RSBuilding is designed to enhance cross-scene generalization and task understanding.
Our model was trained on a dataset comprising up to 245,000 images and validated on multiple building extraction and change detection datasets.
arXiv Detail & Related papers (2024-03-12T11:51:59Z) - Instruct-Imagen: Image Generation with Multi-modal Instruction [90.04481955523514]
instruct-imagen is a model that tackles heterogeneous image generation tasks and generalizes across unseen tasks.
We introduce *multi-modal instruction* for image generation, a task representation articulating a range of generation intents with precision.
Human evaluation on various image generation datasets reveals that instruct-imagen matches or surpasses prior task-specific models in-domain.
arXiv Detail & Related papers (2024-01-03T19:31:58Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Zero-shot Composed Text-Image Retrieval [72.43790281036584]
We consider the problem of composed image retrieval (CIR)
It aims to train a model that can fuse multi-modal information, e.g., text and images, to accurately retrieve images that match the query, extending the user's expression ability.
arXiv Detail & Related papers (2023-06-12T17:56:01Z) - UniDiff: Advancing Vision-Language Models with Generative and
Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC)
UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z) - Intriguing Property and Counterfactual Explanation of GAN for Remote Sensing Image Generation [25.96740500337747]
Generative adversarial networks (GANs) have achieved remarkable progress in the natural image field.
GAN model is more sensitive to the size of training data for RS image generation than for natural image generation.
We propose two innovative adjustment schemes, namely Uniformity Regularization (UR) and Entropy Regularization (ER), to increase the information learned by the GAN model.
arXiv Detail & Related papers (2023-03-09T13:22:50Z) - Meta Internal Learning [88.68276505511922]
Internal learning for single-image generation is a framework, where a generator is trained to produce novel images based on a single image.
We propose a meta-learning approach that enables training over a collection of images, in order to model the internal statistics of the sample image more effectively.
Our results show that the models obtained are as suitable as single-image GANs for many common image applications.
arXiv Detail & Related papers (2021-10-06T16:27:38Z) - Generating Annotated High-Fidelity Images Containing Multiple Coherent
Objects [10.783993190686132]
We propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring contextual information.
We demonstrate how coherency and fidelity are preserved with our method through experiments on the Multi-MNIST and CLEVR datasets.
arXiv Detail & Related papers (2020-06-22T11:33:55Z) - Improving Learning Effectiveness For Object Detection and Classification
in Cluttered Backgrounds [6.729108277517129]
This paper develops a framework that permits to autonomously generate a training dataset in heterogeneous cluttered backgrounds.
It is clear that the learning effectiveness of the proposed framework should be improved in complex and heterogeneous environments.
The performance of the proposed framework is investigated through empirical tests and compared with that of the model trained with the COCO dataset.
arXiv Detail & Related papers (2020-02-27T22:28:48Z) - Concurrently Extrapolating and Interpolating Networks for Continuous
Model Generation [34.72650269503811]
We propose a simple yet effective model generation strategy to form a sequence of models that only requires a set of specific-effect label images.
We show that the proposed method is capable of producing a series of continuous models and achieves better performance than that of several state-of-the-art methods for image smoothing.
arXiv Detail & Related papers (2020-01-12T04:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.