Multilingual Conceptual Coverage in Text-to-Image Models
- URL: http://arxiv.org/abs/2306.01735v1
- Date: Fri, 2 Jun 2023 17:59:09 GMT
- Title: Multilingual Conceptual Coverage in Text-to-Image Models
- Authors: Michael Saxon, William Yang Wang
- Abstract summary: "Conceptual Coverage Across Languages" (CoCo-CroLa) is a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns.
For each model we can assess "conceptual coverage" of a given target language relative to a source language by comparing the population of images generated for a series of tangible nouns in the source language to the population of images generated for each noun under translation in the target language.
- Score: 98.80343331645626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose "Conceptual Coverage Across Languages" (CoCo-CroLa), a technique
for benchmarking the degree to which any generative text-to-image system
provides multilingual parity to its training language in terms of tangible
nouns. For each model we can assess "conceptual coverage" of a given target
language relative to a source language by comparing the population of images
generated for a series of tangible nouns in the source language to the
population of images generated for each noun under translation in the target
language. This technique allows us to estimate how well-suited a model is to a
target language as well as identify model-specific weaknesses, spurious
correlations, and biases without a-priori assumptions. We demonstrate how it
can be used to benchmark T2I models in terms of multilinguality, and how
despite its simplicity it is a good proxy for impressive generalization.
Related papers
- Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations [53.89380284760555]
We introduce Babel-ImageNet, a massively multilingual benchmark that offers partial translations of ImageNet labels to 100 languages.
We evaluate 11 public multilingual CLIP models on our benchmark, demonstrating a significant gap between English ImageNet performance and that of high-resource languages.
We show that the performance of multilingual CLIP can be drastically improved for low-resource languages with parameter-efficient language-specific training.
arXiv Detail & Related papers (2023-06-14T17:53:06Z) - Visually Grounded Reasoning across Languages and Cultures [27.31020761908739]
We develop a new protocol to construct an ImageNet-style hierarchy representative of more languages and cultures.
We focus on a typologically diverse set of languages, namely, Indonesian, Mandarin Chinese, Swahili, Tamil, and Turkish.
We create a multilingual dataset for Multicultural Reasoning over Vision and Language (MaRVL) by eliciting statements from native speaker annotators about pairs of images.
arXiv Detail & Related papers (2021-09-28T16:51:38Z) - Language Models are Few-shot Multilingual Learners [66.11011385895195]
We evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages.
We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones.
arXiv Detail & Related papers (2021-09-16T03:08:22Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Contrastive Language-Image Pre-training for the Italian Language [4.804798944613199]
We present the first CLIP model for the Italian Language (CLIP-Italian) trained on more than 1.4 million image-text pairs.
Results show that CLIP-Italian outperforms the multilingual CLIP model on the tasks of image retrieval and zero-shot classification.
arXiv Detail & Related papers (2021-08-19T13:53:47Z) - Read Like Humans: Autonomous, Bidirectional and Iterative Language
Modeling for Scene Text Recognition [80.446770909975]
Linguistic knowledge is of great benefit to scene text recognition.
How to effectively model linguistic rules in end-to-end deep networks remains a research challenge.
We propose an autonomous, bidirectional and iterative ABINet for scene text recognition.
arXiv Detail & Related papers (2021-03-11T06:47:45Z) - Cross-lingual Spoken Language Understanding with Regularized
Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource.
Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z) - Mono vs Multilingual Transformer-based Models: a Comparison across
Several Language Tasks [1.2691047660244335]
BERT (Bidirectional Representations from Transformers) and ALBERT (A Lite BERT) are methods for pre-training language models.
We make available our trained BERT and Albert model for Portuguese.
arXiv Detail & Related papers (2020-07-19T19:13:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.