Toward Cross-Lingual Definition Generation for Language Learners
- URL: http://arxiv.org/abs/2010.05533v1
- Date: Mon, 12 Oct 2020 08:45:28 GMT
- Title: Toward Cross-Lingual Definition Generation for Language Learners
- Authors: Cunliang Kong, Liner Yang, Tianzuo Zhang, Qinan Fan, Zhenghao Liu, Yun
Chen, Erhong Yang
- Abstract summary: We propose to generate definitions in English for words in various languages.
Models can be directly applied to other languages after trained on the English dataset.
Experiments and manual analyses show that our models have a strong cross-lingual transfer ability.
- Score: 10.45755551957024
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating dictionary definitions automatically can prove useful for language
learners. However, it's still a challenging task of cross-lingual definition
generation. In this work, we propose to generate definitions in English for
words in various languages. To achieve this, we present a simple yet effective
approach based on publicly available pretrained language models. In this
approach, models can be directly applied to other languages after trained on
the English dataset. We demonstrate the effectiveness of this approach on
zero-shot definition generation. Experiments and manual analyses on newly
constructed datasets show that our models have a strong cross-lingual transfer
ability and can generate fluent English definitions for Chinese words. We
further measure the lexical complexity of generated and reference definitions.
The results show that the generated definitions are much simpler, which is more
suitable for language learners.
Related papers
- Generating Continuations in Multilingual Idiomatic Contexts [2.0849578298972835]
We test the ability of generative language models (LMs) in understanding nuanced language containing non-compositional figurative text.
We conduct experiments using datasets in two distinct languages (English and Portuguese) under three different training settings.
Our results suggest that the models are only slightly better at generating continuations for literal contexts than idiomatic contexts, with exceedingly small margins.
arXiv Detail & Related papers (2023-10-31T05:40:33Z) - Counterfactually Probing Language Identity in Multilingual Models [15.260518230218414]
We use AlterRep, a method of counterfactual probing, to explore the internal structure of multilingual models.
We find that, given a template in Language X, pushing towards Language Y systematically increases the probability of Language Y words.
arXiv Detail & Related papers (2023-10-29T01:21:36Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Assisting Language Learners: Automated Trans-Lingual Definition
Generation via Contrastive Prompt Learning [25.851611353632926]
The standard definition generation task requires to automatically produce mono-lingual definitions.
We propose a novel task of Trans-Lingual Definition Generation (TLDG), which aims to generate definitions in another language.
arXiv Detail & Related papers (2023-06-09T17:32:45Z) - Language Models are Few-shot Multilingual Learners [66.11011385895195]
We evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages.
We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones.
arXiv Detail & Related papers (2021-09-16T03:08:22Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Constrained Language Models Yield Few-Shot Semantic Parsers [73.50960967598654]
We explore the use of large pretrained language models as few-shot semantics.
The goal in semantic parsing is to generate a structured meaning representation given a natural language input.
We use language models to paraphrase inputs into a controlled sublanguage resembling English that can be automatically mapped to a target meaning representation.
arXiv Detail & Related papers (2021-04-18T08:13:06Z) - Revisiting Language Encoding in Learning Multilingual Representations [70.01772581545103]
We propose a new approach called Cross-lingual Language Projection (XLP) to replace language embedding.
XLP projects the word embeddings into language-specific semantic space, and then the projected embeddings will be fed into the Transformer model.
Experiments show that XLP can freely and significantly boost the model performance on extensive multilingual benchmark datasets.
arXiv Detail & Related papers (2021-02-16T18:47:10Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.