Translate to Disambiguate: Zero-shot Multilingual Word Sense
Disambiguation with Pretrained Language Models
- URL: http://arxiv.org/abs/2304.13803v1
- Date: Wed, 26 Apr 2023 19:55:52 GMT
- Title: Translate to Disambiguate: Zero-shot Multilingual Word Sense
Disambiguation with Pretrained Language Models
- Authors: Haoqiang Kang and Terra Blevins and Luke Zettlemoyer
- Abstract summary: Pretrained Language Models (PLMs) learn rich cross-lingual knowledge and can be finetuned to perform well on diverse tasks.
We present a new study investigating how well PLMs capture cross-lingual word sense with Contextual Word-Level Translation (C-WLT)
We find that as the model size increases, PLMs encode more cross-lingual word sense knowledge and better use context to improve WLT performance.
- Score: 67.19567060894563
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained Language Models (PLMs) learn rich cross-lingual knowledge and can
be finetuned to perform well on diverse tasks such as translation and
multilingual word sense disambiguation (WSD). However, they often struggle at
disambiguating word sense in a zero-shot setting. To better understand this
contrast, we present a new study investigating how well PLMs capture
cross-lingual word sense with Contextual Word-Level Translation (C-WLT), an
extension of word-level translation that prompts the model to translate a given
word in context. We find that as the model size increases, PLMs encode more
cross-lingual word sense knowledge and better use context to improve WLT
performance. Building on C-WLT, we introduce a zero-shot approach for WSD,
tested on 18 languages from the XL-WSD dataset. Our method outperforms fully
supervised baselines on recall for many evaluation languages without additional
training or finetuning. This study presents a first step towards understanding
how to best leverage the cross-lingual knowledge inside PLMs for robust
zero-shot reasoning in any language.
Related papers
- Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages [55.157295899188476]
neural machine translation systems learn to map sentences of different languages into a common representation space.
In this work, we test this hypothesis by zero-shot translating from unseen languages.
We demonstrate that this setup enables zero-shot translation from entirely unseen languages.
arXiv Detail & Related papers (2024-08-05T07:58:58Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Decomposed Prompting: Unveiling Multilingual Linguistic Structure
Knowledge in English-Centric Large Language Models [12.700783525558721]
English-centric Large Language Models (LLMs) like GPT-3 and LLaMA display a remarkable ability to perform multilingual tasks.
This paper introduces the decomposed prompting approach to probe the linguistic structure understanding of these LLMs in sequence labeling tasks.
arXiv Detail & Related papers (2024-02-28T15:15:39Z) - Chain-of-Dictionary Prompting Elicits Translation in Large Language Models [100.47154959254937]
Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT)
We present a novel method, CoD, which augments LLMs with prior knowledge with the chains of multilingual dictionaries for a subset of input words to elicit translation abilities.
arXiv Detail & Related papers (2023-05-11T05:19:47Z) - Exposing Cross-Lingual Lexical Knowledge from Multilingual Sentence
Encoders [85.80950708769923]
We probe multilingual language models for the amount of cross-lingual lexical knowledge stored in their parameters, and compare them against the original multilingual LMs.
We also devise a novel method to expose this knowledge by additionally fine-tuning multilingual models.
We report substantial gains on standard benchmarks.
arXiv Detail & Related papers (2022-04-30T13:23:16Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.