Image Translation via Fine-grained Knowledge Transfer
- URL: http://arxiv.org/abs/2012.11193v1
- Date: Mon, 21 Dec 2020 09:18:48 GMT
- Title: Image Translation via Fine-grained Knowledge Transfer
- Authors: Xuanhong Chen, Ziang Liu, Ting Qiu, Bingbing Ni, Naiyuan Liu, Xiwei
Hu, Yuhan Li
- Abstract summary: We propose an interpretable knowledge-based image-translation framework, which realizes the image-translation through knowledge retrieval and transfer.
In details, the framework constructs a plug-and-play and model-agnostic general purpose knowledge library, remembering task-specific styles, tones, texture patterns, etc.
- Score: 36.898373109689814
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Prevailing image-translation frameworks mostly seek to process images via the
end-to-end style, which has achieved convincing results. Nonetheless, these
methods lack interpretability and are not scalable on different
image-translation tasks (e.g., style transfer, HDR, etc.). In this paper, we
propose an interpretable knowledge-based image-translation framework, which
realizes the image-translation through knowledge retrieval and transfer. In
details, the framework constructs a plug-and-play and model-agnostic general
purpose knowledge library, remembering task-specific styles, tones, texture
patterns, etc. Furthermore, we present a fast ANN searching approach, Bandpass
Hierarchical K-Means (BHKM), to cope with the difficulty of searching in the
enormous knowledge library. Extensive experiments well demonstrate the
effectiveness and feasibility of our framework in different image-translation
tasks. In particular, backtracking experiments verify the interpretability of
our method. Our code soon will be available at
https://github.com/AceSix/Knowledge_Transfer.
Related papers
- AnyTrans: Translate AnyText in the Image with Large Scale Models [88.5887934499388]
This paper introduces AnyTrans, an all-encompassing framework for the task-Translate AnyText in the Image (TATI)
Our framework incorporates contextual cues from both textual and visual elements during translation.
We have meticulously compiled a test dataset called MTIT6, which consists of multilingual text image translation data from six language pairs.
arXiv Detail & Related papers (2024-06-17T11:37:48Z) - Retrieval-Augmented Transformer for Image Captioning [51.79146669195357]
We develop an image captioning approach with a kNN memory, with which knowledge can be retrieved from an external corpus to aid the generation process.
Our architecture combines a knowledge retriever based on visual similarities, a differentiable encoder, and a kNN-augmented attention layer to predict tokens.
Experimental results, conducted on the COCO dataset, demonstrate that employing an explicit external memory can aid the generation process and increase caption quality.
arXiv Detail & Related papers (2022-07-26T19:35:49Z) - Unsupervised Image-to-Image Translation with Generative Prior [103.54337984566877]
Unsupervised image-to-image translation aims to learn the translation between two visual domains without paired data.
We present a novel framework, Generative Prior-guided UN Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm.
arXiv Detail & Related papers (2022-04-07T17:59:23Z) - Semi-Supervised Image-to-Image Translation using Latent Space Mapping [37.232496213047845]
We introduce a general framework for semi-supervised image translation.
Our main idea is to learn the translation over the latent feature space instead of the image space.
Thanks to the low dimensional feature space, it is easier to find the desired mapping function.
arXiv Detail & Related papers (2022-03-29T05:14:26Z) - Leveraging Visual Knowledge in Language Tasks: An Empirical Study on
Intermediate Pre-training for Cross-modal Knowledge Transfer [61.34424171458634]
We study whether integrating visual knowledge into a language model can fill the gap.
Our experiments show that visual knowledge transfer can improve performance in both low-resource and fully supervised settings.
arXiv Detail & Related papers (2022-03-14T22:02:40Z) - A Thousand Words Are Worth More Than a Picture: Natural Language-Centric
Outside-Knowledge Visual Question Answering [47.1063091195119]
We call for a paradigm shift for the OK-VQA task, which transforms the image into plain text.
A Transform-Retrieve-Generate framework (TRiG) is proposed, which can be plug-and-played with alternative image-to-text models.
Experimental results show that our TRiG framework outperforms all state-of-the-art supervised methods by at least 11.1% absolute margin.
arXiv Detail & Related papers (2022-01-14T04:12:46Z) - The Curious Layperson: Fine-Grained Image Recognition without Expert
Labels [90.88501867321573]
We consider a new problem: fine-grained image recognition without expert annotations.
We learn a model to describe the visual appearance of objects using non-expert image descriptions.
We then train a fine-grained textual similarity model that matches image descriptions with documents on a sentence-level basis.
arXiv Detail & Related papers (2021-11-05T17:58:37Z) - Few-Shot Unsupervised Image-to-Image Translation on complex scenes [0.0]
In this work, we assess how a method that has initially been developed for single object translation performs on more diverse and content-rich images.
We present a way to extend a dataset based on object detection. Moreover, we propose a way to adapt the FUNIT framework in order to leverage the power of object detection.
arXiv Detail & Related papers (2021-06-07T16:33:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.