Jurassic World Remake: Bringing Ancient Fossils Back to Life via
Zero-Shot Long Image-to-Image Translation
- URL: http://arxiv.org/abs/2308.07316v1
- Date: Mon, 14 Aug 2023 17:59:31 GMT
- Title: Jurassic World Remake: Bringing Ancient Fossils Back to Life via
Zero-Shot Long Image-to-Image Translation
- Authors: Alexander Martin and Haitian Zheng and Jie An and Jiebo Luo
- Abstract summary: We use text-guided latent diffusion models for zero-shot image-to-image translation (I2I) across large domain gaps.
Being able to perform translations across large domain gaps has a wide variety of real-world applications in criminology, astrology, environmental conservation, and paleontology.
- Score: 97.40572668025273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With a strong understanding of the target domain from natural language, we
produce promising results in translating across large domain gaps and bringing
skeletons back to life. In this work, we use text-guided latent diffusion
models for zero-shot image-to-image translation (I2I) across large domain gaps
(longI2I), where large amounts of new visual features and new geometry need to
be generated to enter the target domain. Being able to perform translations
across large domain gaps has a wide variety of real-world applications in
criminology, astrology, environmental conservation, and paleontology. In this
work, we introduce a new task Skull2Animal for translating between skulls and
living animals. On this task, we find that unguided Generative Adversarial
Networks (GANs) are not capable of translating across large domain gaps.
Instead of these traditional I2I methods, we explore the use of guided
diffusion and image editing models and provide a new benchmark model,
Revive-2I, capable of performing zero-shot I2I via text-prompting latent
diffusion models. We find that guidance is necessary for longI2I because, to
bridge the large domain gap, prior knowledge about the target domain is needed.
In addition, we find that prompting provides the best and most scalable
information about the target domain as classifier-guided diffusion models
require retraining for specific use cases and lack stronger constraints on the
target domain because of the wide variety of images they are trained on.
Related papers
- S2ST: Image-to-Image Translation in the Seed Space of Latent Diffusion [23.142097481682306]
We introduce S2ST, a novel framework designed to accomplish global I2IT in complex images.
S2ST operates within the seed space of a Latent Diffusion Model, thereby leveraging the powerful image priors learned by the latter.
We show that S2ST surpasses state-of-the-art GAN-based I2IT methods, as well as diffusion-based approaches, for complex automotive scenes.
arXiv Detail & Related papers (2023-11-30T18:59:49Z) - Domain-Scalable Unpaired Image Translation via Latent Space Anchoring [88.7642967393508]
Unpaired image-to-image translation (UNIT) aims to map images between two visual domains without paired training data.
We propose a new domain-scalable UNIT method, termed as latent space anchoring.
Our method anchors images of different domains to the same latent space of frozen GANs by learning lightweight encoder and regressor models.
In the inference phase, the learned encoders and decoders of different domains can be arbitrarily combined to translate images between any two domains without fine-tuning.
arXiv Detail & Related papers (2023-06-26T17:50:02Z) - Domain Adaptive and Generalizable Network Architectures and Training
Strategies for Semantic Image Segmentation [108.33885637197614]
Unsupervised domain adaptation (UDA) and domain generalization (DG) enable machine learning models trained on a source domain to perform well on unlabeled or unseen target domains.
We propose HRDA, a multi-resolution framework for UDA&DG, that combines the strengths of small high-resolution crops to preserve fine segmentation details and large low-resolution crops to capture long-range context dependencies with a learned scale attention.
arXiv Detail & Related papers (2023-04-26T15:18:45Z) - Using Language to Extend to Unseen Domains [81.37175826824625]
It is expensive to collect training data for every possible domain that a vision model may encounter when deployed.
We consider how simply verbalizing the training domain as well as domains we want to extend to but do not have data for can improve robustness.
Using a multimodal model with a joint image and language embedding space, our method LADS learns a transformation of the image embeddings from the training domain to each unseen test domain.
arXiv Detail & Related papers (2022-10-18T01:14:02Z) - ME-D2N: Multi-Expert Domain Decompositional Network for Cross-Domain
Few-Shot Learning [95.78635058475439]
Cross-Domain Few-Shot Learning aims at addressing the Few-Shot Learning problem across different domains.
This paper technically contributes a novel Multi-Expert Domain Decompositional Network (ME-D2N)
We present a novel domain decomposition module that learns to decompose the student model into two domain-related sub parts.
arXiv Detail & Related papers (2022-10-11T09:24:47Z) - Few-Shot Object Detection in Unseen Domains [4.36080478413575]
Few-shot object detection (FSOD) has thrived in recent years to learn novel object classes with limited data.
We propose various data augmentations techniques on the few shots of novel classes to account for all possible domain-specific information.
Our experiments on the T-LESS dataset show that the proposed approach succeeds in alleviating the domain gap considerably.
arXiv Detail & Related papers (2022-04-11T13:16:41Z) - Leveraging Local Domains for Image-to-Image Translation [11.03611991082568]
Image-to-image (i2i) networks struggle to capture local changes because they do not affect the global scene structure.
We leverage human knowledge about spatial domain characteristics which we refer to as 'local domains'
We train a patch-based GAN on few source data and hallucinate a new unseen domain which subsequently eases transfer learning to target.
arXiv Detail & Related papers (2021-09-09T17:59:52Z) - Fine-Tuning StyleGAN2 For Cartoon Face Generation [0.0]
We propose a novel image-to-image translation method that generates images of the target domain by finetuning a stylegan2 pretrained model.
The stylegan2 model is suitable for unsupervised I2I translation on unbalanced datasets.
arXiv Detail & Related papers (2021-06-22T14:00:10Z) - Crossing-Domain Generative Adversarial Networks for Unsupervised
Multi-Domain Image-to-Image Translation [12.692904507625036]
We propose a general framework for unsupervised image-to-image translation across multiple domains.
Our proposed framework consists of a pair of encoders along with a pair of GANs which learns high-level features across different domains to generate diverse and realistic samples from.
arXiv Detail & Related papers (2020-08-27T01:54:07Z) - Domain Adaptation for Semantic Parsing [68.81787666086554]
We propose a novel semantic for domain adaptation, where we have much fewer annotated data in the target domain compared to the source domain.
Our semantic benefits from a two-stage coarse-to-fine framework, thus can provide different and accurate treatments for the two stages.
Experiments on a benchmark dataset show that our method consistently outperforms several popular domain adaptation strategies.
arXiv Detail & Related papers (2020-06-23T14:47:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.