Can Uniform Meaning Representation Help GPT-4 Translate from Indigenous Languages?
- URL: http://arxiv.org/abs/2502.08900v1
- Date: Thu, 13 Feb 2025 02:27:30 GMT
- Title: Can Uniform Meaning Representation Help GPT-4 Translate from Indigenous Languages?
- Authors: Shira Wein,
- Abstract summary: We explore the technical utility of Uniform Meaning Representation (UMR) for low-resource languages by incorporating it into GPT-4 prompts.
We find that in the majority of our test cases, integrating UMR into the prompt results in a statistically significant increase in performance.
- Score: 4.02487511510606
- License:
- Abstract: While ChatGPT and GPT-based models are able to effectively perform many tasks without additional fine-tuning, they struggle with related to extremely low-resource languages and indigenous languages. Uniform Meaning Representation (UMR), a semantic representation designed to capture the meaning of texts in many languages, is well-poised to be leveraged in the development of low-resource language technologies. In this work, we explore the downstream technical utility of UMR for low-resource languages by incorporating it into GPT-4 prompts. Specifically, we examine the ability of GPT-4 to perform translation from three indigenous languages (Navajo, Ar\'apaho, and Kukama), with and without demonstrations, as well as with and without UMR annotations. Ultimately we find that in the majority of our test cases, integrating UMR into the prompt results in a statistically significant increase in performance, which is a promising indication of future applications of the UMR formalism.
Related papers
- The importance of visual modelling languages in generative software engineering [0.0]
GPT-4 accepts image and text inputs, rather than simply natural language.
To the best of our knowledge, no other work has investigated similar use cases involving Software Engineering tasks carried out via multimodal GPTs.
arXiv Detail & Related papers (2024-11-27T01:15:36Z) - An Empirical Study on Information Extraction using Large Language Models [36.090082785047855]
Human-like large language models (LLMs) have proven to be very helpful for many natural language processing (NLP) related tasks.
We propose and analyze the effects of a series of simple prompt-based methods on GPT-4's information extraction ability.
arXiv Detail & Related papers (2024-08-31T07:10:16Z) - Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study [14.300310437948443]
This paper explores a framework that leverages the benefits of a pre-trained language model along with knowledge distillation in a seq2seq architecture to facilitate translation for low-resource languages.
Our framework is evaluated on three low-resource Indic languages in four Indic-to-Indic directions, yielding significant BLEU-4 and chrF improvements over baselines.
arXiv Detail & Related papers (2024-07-09T04:19:52Z) - GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks [70.98062518872999]
We validate GPT-4V's capabilities for evaluation purposes, addressing tasks ranging from foundational image-to-text and text-to-image synthesis to high-level image-to-image translations and multi-images to text alignment.
Notably, GPT-4V shows promising agreement with humans across various tasks and evaluation methods, demonstrating immense potential for multi-modal LLMs as evaluators.
arXiv Detail & Related papers (2023-11-02T16:11:09Z) - Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models [6.145834902689888]
Large language models (LLMs) have demonstrated impressive performance on various downstream tasks without requiring fine-tuning.
Despite having a lower training proportion compared to English, these models also exhibit remarkable capabilities in other languages.
In this study, we assess the performance of GPT-3.5 and GPT-4 models on seven distinct Arabic NLP tasks.
arXiv Detail & Related papers (2023-06-28T15:54:29Z) - Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [75.33019401706188]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars.
We propose to assemble synthetic exemplars from a diverse set of high-resource languages to prompt the LLMs to translate from any language into English.
Our unsupervised prompting method performs on par with supervised few-shot learning in LLMs of different sizes for translations between English and 13 Indic and 21 African low-resource languages.
arXiv Detail & Related papers (2023-06-20T08:27:47Z) - Visual Instruction Tuning [79.70923292053097]
We present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data.
By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant.
When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieves a new state-of-the-art accuracy of 92.53%.
arXiv Detail & Related papers (2023-04-17T17:59:25Z) - Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks.
This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z) - Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval [51.004601358498135]
Mr. TyDi is a benchmark dataset for mono-lingual retrieval in eleven typologically diverse languages.
The goal of this resource is to spur research in dense retrieval techniques in non-English languages.
arXiv Detail & Related papers (2021-08-19T16:53:43Z) - Building Low-Resource NER Models Using Non-Speaker Annotation [58.78968578460793]
Cross-lingual methods have had notable success in addressing these concerns.
We propose a complementary approach to building low-resource Named Entity Recognition (NER) models using non-speaker'' (NS) annotations.
We show that use of NS annotators produces results that are consistently on par or better than cross-lingual methods built on modern contextual representations.
arXiv Detail & Related papers (2020-06-17T03:24:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.