Text to Image Generation: Leaving no Language Behind
- URL: http://arxiv.org/abs/2208.09333v1
- Date: Fri, 19 Aug 2022 13:24:56 GMT
- Title: Text to Image Generation: Leaving no Language Behind
- Authors: Pedro Reviriego and Elena Merino-G\'omez
- Abstract summary: We study how the performance of three popular text-to-image generators depends on the language.
The results show that there is a significant performance degradation when using languages other than English.
This is fundamental to ensure that this new technology can be used by non-native English speakers.
- Score: 6.243995448840211
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the latest applications of Artificial Intelligence (AI) is to generate
images from natural language descriptions. These generators are now becoming
available and achieve impressive results that have been used for example in the
front cover of magazines. As the input to the generators is in the form of a
natural language text, a question that arises immediately is how these models
behave when the input is written in different languages. In this paper we
perform an initial exploration of how the performance of three popular
text-to-image generators depends on the language. The results show that there
is a significant performance degradation when using languages other than
English, especially for languages that are not widely used. This observation
leads us to discuss different alternatives on how text-to-image generators can
be improved so that performance is consistent across different languages. This
is fundamental to ensure that this new technology can be used by non-native
English speakers and to preserve linguistic diversity.
Related papers
- Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Indonesian Text-to-Image Synthesis with Sentence-BERT and FastGAN [0.0]
We use Sentence BERT as the text encoder and FastGAN as the image generator.
We translate the CUB dataset into Bahasa using google translate and manually by humans.
FastGAN uses lots of skip excitation modules and auto-encoder to generate an image with resolution 512x512x3, which is twice as bigger as the current state-of-the-art model.
arXiv Detail & Related papers (2023-03-25T16:54:22Z) - On Advances in Text Generation from Images Beyond Captioning: A Case
Study in Self-Rationalization [89.94078728495423]
We show that recent advances in each modality, CLIP image representations and scaling of language models, do not consistently improve multimodal self-rationalization of tasks with multimodal inputs.
Our findings call for a backbone modelling approach that can be built on to advance text generation from images and text beyond image captioning.
arXiv Detail & Related papers (2022-05-24T00:52:40Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Generalising Multilingual Concept-to-Text NLG with Language Agnostic
Delexicalisation [0.40611352512781856]
Concept-to-text Natural Language Generation is the task of expressing an input meaning representation in natural language.
We propose Language Agnostic Delexicalisation, a novel delexicalisation method that uses multilingual pretrained embeddings.
Our experiments across five datasets and five languages show that multilingual models outperform monolingual models in concept-to-text.
arXiv Detail & Related papers (2021-05-07T17:48:53Z) - Read Like Humans: Autonomous, Bidirectional and Iterative Language
Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition.
How to effectively model linguistic rules in end-to-end deep networks remains a research challenge.
We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z) - Vokenization: Improving Language Understanding with Contextualized,
Visual-Grounded Supervision [110.66085917826648]
We develop a technique that extrapolates multimodal alignments to language-only data by contextually mapping language tokens to their related images.
"vokenization" is trained on relatively small image captioning datasets and we then apply it to generate vokens for large language corpora.
Trained with these contextually generated vokens, our visually-supervised language models show consistent improvements over self-supervised alternatives on multiple pure-language tasks.
arXiv Detail & Related papers (2020-10-14T02:11:51Z) - Mono vs Multilingual Transformer-based Models: a Comparison across
Several Language Tasks [1.2691047660244335]
BERT (Bidirectional Representations from Transformers) and ALBERT (A Lite BERT) are methods for pre-training language models.
We make available our trained BERT and Albert model for Portuguese.
arXiv Detail & Related papers (2020-07-19T19:13:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.