Are Structural Concepts Universal in Transformer Language Models?
Towards Interpretable Cross-Lingual Generalization
- URL: http://arxiv.org/abs/2310.12794v2
- Date: Fri, 22 Dec 2023 15:00:26 GMT
- Title: Are Structural Concepts Universal in Transformer Language Models?
Towards Interpretable Cross-Lingual Generalization
- Authors: Ningyu Xu, Qi Zhang, Jingting Ye, Menghan Zhang, Xuanjing Huang
- Abstract summary: We investigate the potential for explicitly aligning conceptual correspondence between languages to enhance cross-lingual generalization.
Using the syntactic aspect of language as a testbed, our analyses of 43 languages reveal a high degree of alignability.
We propose a meta-learning-based method to learn to align conceptual spaces of different languages.
- Score: 27.368684663279463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have exhibited considerable cross-lingual
generalization abilities, whereby they implicitly transfer knowledge across
languages. However, the transfer is not equally successful for all languages,
especially for low-resource ones, which poses an ongoing challenge. It is
unclear whether we have reached the limits of implicit cross-lingual
generalization and if explicit knowledge transfer is viable. In this paper, we
investigate the potential for explicitly aligning conceptual correspondence
between languages to enhance cross-lingual generalization. Using the syntactic
aspect of language as a testbed, our analyses of 43 languages reveal a high
degree of alignability among the spaces of structural concepts within each
language for both encoder-only and decoder-only LLMs. We then propose a
meta-learning-based method to learn to align conceptual spaces of different
languages, which facilitates zero-shot and few-shot generalization in concept
classification and also offers insights into the cross-lingual in-context
learning phenomenon. Experiments on syntactic analysis tasks show that our
approach achieves competitive results with state-of-the-art methods and narrows
the performance gap between languages, particularly benefiting those with
limited resources.
Related papers
- Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Interpretability of Language Models via Task Spaces [14.543168558734001]
We present an alternative approach to interpret language models (LMs)
We focus on the quality of LM processing, with a focus on their language abilities.
We construct 'linguistic task spaces' that shed light on the connections LMs draw between language phenomena.
arXiv Detail & Related papers (2024-06-10T16:34:30Z) - On the cross-lingual transferability of multilingual prototypical models
across NLU tasks [2.44288434255221]
Supervised deep learning-based approaches have been applied to task-oriented dialog and have proven to be effective for limited domain and language applications.
In practice, these approaches suffer from the drawbacks of domain-driven design and under-resourced languages.
This article proposes to investigate the cross-lingual transferability of using synergistically few-shot learning with prototypical neural networks and multilingual Transformers-based models.
arXiv Detail & Related papers (2022-07-19T09:55:04Z) - Cross-lingual Lifelong Learning [53.06904052325966]
We present a principled Cross-lingual Continual Learning (CCL) evaluation paradigm.
We provide insights into what makes multilingual sequential learning particularly challenging.
The implications of this analysis include a recipe for how to measure and balance different cross-lingual continual learning desiderata.
arXiv Detail & Related papers (2022-05-23T09:25:43Z) - Multi-level Contrastive Learning for Cross-lingual Spoken Language
Understanding [90.87454350016121]
We develop novel code-switching schemes to generate hard negative examples for contrastive learning at all levels.
We develop a label-aware joint model to leverage label semantics for cross-lingual knowledge transfer.
arXiv Detail & Related papers (2022-05-07T13:44:28Z) - GL-CLeF: A Global-Local Contrastive Learning Framework for Cross-lingual
Spoken Language Understanding [74.39024160277809]
We present Global--Local Contrastive Learning Framework (GL-CLeF) to address this shortcoming.
Specifically, we employ contrastive learning, leveraging bilingual dictionaries to construct multilingual views of the same utterance.
GL-CLeF achieves the best performance and successfully pulls representations of similar sentences across languages closer.
arXiv Detail & Related papers (2022-04-18T13:56:58Z) - Cross-Lingual Language Model Meta-Pretraining [21.591492094502424]
We propose a cross-lingual language model meta-pretraining, which learns the two abilities in different training phases.
Our method improves both generalization and cross-lingual transfer, and produces better-aligned representations across different languages.
arXiv Detail & Related papers (2021-09-23T03:47:44Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - Understanding Cross-Lingual Syntactic Transfer in Multilingual Recurrent
Neural Networks [3.9342247746757435]
It is now established that modern neural language models can be successfully trained on multiple languages simultaneously.
But what kind of knowledge is really shared among languages within these models?
In this paper we dissect different forms of cross-lingual transfer and look for its most determining factors.
We find that exposing our LMs to a related language does not always increase grammatical knowledge in the target language, and that optimal conditions for lexical-semantic transfer may not be optimal for syntactic transfer.
arXiv Detail & Related papers (2020-03-31T09:48:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.