Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
- URL: http://arxiv.org/abs/2403.11092v1
- Date: Sun, 17 Mar 2024 05:05:11 GMT
- Title: Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts
- Authors: Michael Saxon, Yiran Luo, Sharon Levy, Chitta Baral, Yezhou Yang, William Yang Wang,
- Abstract summary: Benchmarks of the multilingual capabilities of text-to-image (T2I) models compare generated images prompted in a test language to an expected image distribution over a concept set.
One such benchmark, "Conceptual Coverage Across Languages" (CoCo-CroLa), assesses the tangible noun inventory of T2I models by prompting them to generate pictures from a concept list translated to seven languages.
We find that this benchmark contains translation errors of varying severity in Spanish, Japanese, and Chinese.
- Score: 107.32683485639654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Benchmarks of the multilingual capabilities of text-to-image (T2I) models compare generated images prompted in a test language to an expected image distribution over a concept set. One such benchmark, "Conceptual Coverage Across Languages" (CoCo-CroLa), assesses the tangible noun inventory of T2I models by prompting them to generate pictures from a concept list translated to seven languages and comparing the output image populations. Unfortunately, we find that this benchmark contains translation errors of varying severity in Spanish, Japanese, and Chinese. We provide corrections for these errors and analyze how impactful they are on the utility and validity of CoCo-CroLa as a benchmark. We reassess multiple baseline T2I models with the revisions, compare the outputs elicited under the new translations to those conditioned on the old, and show that a correction's impactfulness on the image-domain benchmark results can be predicted in the text domain with similarity scores. Our findings will guide the future development of T2I multilinguality metrics by providing analytical tools for practical translation decisions.
Related papers
- VLEU: a Method for Automatic Evaluation for Generalizability of Text-to-Image Models [18.259733507395634]
We introduce a new metric called Visual Language Evaluation Understudy (VLEU)
VLEU quantifies a model's generalizability by computing the Kullback-Leibler divergence between the marginal distribution of the visual text and the conditional distribution of the images generated by the model.
Our experiments demonstrate the effectiveness of VLEU in evaluating the generalization capability of various T2I models.
arXiv Detail & Related papers (2024-09-23T04:50:36Z) - Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation [81.45400849638347]
In-image machine translation (IIMT) aims to translate an image containing texts in source language into an image containing translations in target language.
In this paper, we propose an end-to-end IIMT model consisting of four modules.
Our model achieves competitive performance compared to cascaded models with only 70.9% of parameters, and significantly outperforms the pixel-level end-to-end IIMT model.
arXiv Detail & Related papers (2024-07-03T08:15:39Z) - Translation Errors Significantly Impact Low-Resource Languages in
Cross-Lingual Learning [26.49647954587193]
In this work, we find that translation inconsistencies do exist and they disproportionally impact low-resource languages in XNLI.
To identify such inconsistencies, we propose measuring the gap in performance between zero-shot evaluations on the human-translated and machine-translated target text.
We also corroborate that translation errors exist for two target languages, namely Hindi and Urdu, by doing a manual reannotation of human-translated test instances.
arXiv Detail & Related papers (2024-02-03T08:22:51Z) - Multilingual Conceptual Coverage in Text-to-Image Models [98.80343331645626]
"Conceptual Coverage Across Languages" (CoCo-CroLa) is a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns.
For each model we can assess "conceptual coverage" of a given target language relative to a source language by comparing the population of images generated for a series of tangible nouns in the source language to the population of images generated for each noun under translation in the target language.
arXiv Detail & Related papers (2023-06-02T17:59:09Z) - RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models [36.19590638188108]
We create new variants of texts and images in the MS-COCO test set and re-evaluate the state-of-the-art (SOTA) models with the new data.
Specifically, we alter the meaning of text by replacing a word, and generate visually altered images that maintain some visual context.
Our evaluations on the proposed benchmark reveal substantial performance degradation in many SOTA models.
arXiv Detail & Related papers (2023-04-21T03:45:59Z) - Photorealistic Text-to-Image Diffusion Models with Deep Language
Understanding [53.170767750244366]
Imagen is a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding.
To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models.
arXiv Detail & Related papers (2022-05-23T17:42:53Z) - Visually Grounded Reasoning across Languages and Cultures [27.31020761908739]
We develop a new protocol to construct an ImageNet-style hierarchy representative of more languages and cultures.
We focus on a typologically diverse set of languages, namely, Indonesian, Mandarin Chinese, Swahili, Tamil, and Turkish.
We create a multilingual dataset for Multicultural Reasoning over Vision and Language (MaRVL) by eliciting statements from native speaker annotators about pairs of images.
arXiv Detail & Related papers (2021-09-28T16:51:38Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z) - Unsupervised Image-to-Image Translation via Pre-trained StyleGAN2
Network [73.5062435623908]
We propose a new I2I translation method that generates a new model in the target domain via a series of model transformations.
By feeding the latent vector into the generated model, we can perform I2I translation between the source domain and target domain.
arXiv Detail & Related papers (2020-10-12T13:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.