Zero-shot Cross-Linguistic Learning of Event Semantics
- URL: http://arxiv.org/abs/2207.02356v1
- Date: Tue, 5 Jul 2022 23:18:36 GMT
- Title: Zero-shot Cross-Linguistic Learning of Event Semantics
- Authors: Malihe Alikhani, Thomas Kober, Bashar Alhafni, Yue Chen, Mert Inan,
Elizabeth Nielsen, Shahab Raji, Mark Steedman, Matthew Stone
- Abstract summary: We look at captions of images across Arabic, Chinese, Farsi, German, Russian, and Turkish.
We show that lexical aspects can be predicted for a given language despite not having observed any annotated data for this language at all.
- Score: 27.997873309702225
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Typologically diverse languages offer systems of lexical and grammatical
aspect that allow speakers to focus on facets of event structure in ways that
comport with the specific communicative setting and discourse constraints they
face. In this paper, we look specifically at captions of images across Arabic,
Chinese, Farsi, German, Russian, and Turkish and describe a computational model
for predicting lexical aspects. Despite the heterogeneity of these languages,
and the salient invocation of distinctive linguistic resources across their
caption corpora, speakers of these languages show surprising similarities in
the ways they frame image content. We leverage this observation for zero-shot
cross-lingual learning and show that lexical aspects can be predicted for a
given language despite not having observed any annotated data for this language
at all.
Related papers
- Analyzing The Language of Visual Tokens [48.62180485759458]
We take a natural-language-centric approach to analyzing discrete visual languages.
We show that higher token innovation drives greater entropy and lower compression, with tokens predominantly representing object parts.
We also show that visual languages lack cohesive grammatical structures, leading to higher perplexity and weaker hierarchical organization compared to natural languages.
arXiv Detail & Related papers (2024-11-07T18:59:28Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - A Large-Scale Multilingual Study of Visual Constraints on Linguistic
Selection of Descriptions [35.82822305925811]
We present a large, multilingual study into how vision constrains linguistic choice, covering four languages and five linguistic properties, such as verb transitivity or use of numerals.
We propose a novel method that leverages existing corpora of images with captions written by native speakers, and apply it to nine corpora, comprising 600k images and 3M captions.
arXiv Detail & Related papers (2023-02-09T17:57:58Z) - The Geometry of Multilingual Language Model Representations [25.880639246639323]
We assess how multilingual language models maintain a shared multilingual representation space while still encoding language-sensitive information in each language.
The subspace means differ along language-sensitive axes that are relatively stable throughout middle layers, and these axes encode information such as token vocabularies.
We visualize representations projected onto language-sensitive and language-neutral axes, identifying language family and part-of-speech clusters, along with spirals, toruses, and curves representing token position information.
arXiv Detail & Related papers (2022-05-22T23:58:24Z) - Analyzing Gender Representation in Multilingual Models [59.21915055702203]
We focus on the representation of gender distinctions as a practical case study.
We examine the extent to which the gender concept is encoded in shared subspaces across different languages.
arXiv Detail & Related papers (2022-04-20T00:13:01Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Rediscovering the Slavic Continuum in Representations Emerging from
Neural Models of Spoken Language Identification [16.369477141866405]
We present a neural model for Slavic language identification in speech signals.
We analyze its emergent representations to investigate whether they reflect objective measures of language relatedness.
arXiv Detail & Related papers (2020-10-22T18:18:19Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.