Cross-Dialect Sentence Transformation: A Comparative Analysis of
Language Models for Adapting Sentences to British English
- URL: http://arxiv.org/abs/2311.07583v1
- Date: Sun, 5 Nov 2023 12:56:28 GMT
- Title: Cross-Dialect Sentence Transformation: A Comparative Analysis of
Language Models for Adapting Sentences to British English
- Authors: Shruti Dutta, Shashwat Mookherjee
- Abstract summary: This study explores linguistic distinctions among American, Indian, and Irish English dialects.
It assesses various Language Models (LLMs) in their ability to generate British English translations from these dialects.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This study explores linguistic distinctions among American, Indian, and Irish
English dialects and assesses various Language Models (LLMs) in their ability
to generate British English translations from these dialects. Using cosine
similarity analysis, the study measures the linguistic proximity between
original British English translations and those produced by LLMs for each
dialect. The findings reveal that Indian and Irish English translations
maintain notably high similarity scores, suggesting strong linguistic alignment
with British English. In contrast, American English exhibits slightly lower
similarity, reflecting its distinct linguistic traits. Additionally, the choice
of LLM significantly impacts translation quality, with Llama-2-70b consistently
demonstrating superior performance. The study underscores the importance of
selecting the right model for dialect translation, emphasizing the role of
linguistic expertise and contextual understanding in achieving accurate
translations.
Related papers
- Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models [52.00446751692225]
We present a novel and simple yet effective method called textbfDictionary textbfInsertion textbfPrompting (textbfDIP)
When providing a non-English prompt, DIP looks up a word dictionary and inserts words' English counterparts into the prompt for LLMs.
It then enables better translation into English and better English model thinking steps which leads to obviously better results.
arXiv Detail & Related papers (2024-11-02T05:10:50Z) - Thank You, Stingray: Multilingual Large Language Models Can Not (Yet) Disambiguate Cross-Lingual Word Sense [30.62699081329474]
We introduce a novel benchmark for cross-lingual sense disambiguation, StingrayBench.
We collect false friends in four language pairs, namely Indonesian-Malay, Indonesian-Tagalog, Chinese-Japanese, and English-German.
In our analysis of various models, we observe they tend to be biased toward higher-resource languages.
arXiv Detail & Related papers (2024-10-28T22:09:43Z) - Holmes: A Benchmark to Assess the Linguistic Competence of Language Models [59.627729608055006]
We introduce Holmes, a new benchmark designed to assess language models (LMs) linguistic competence.
We use computation-based probing to examine LMs' internal representations regarding distinct linguistic phenomena.
As a result, we meet recent calls to disentangle LMs' linguistic competence from other cognitive abilities.
arXiv Detail & Related papers (2024-04-29T17:58:36Z) - Is Translation All You Need? A Study on Solving Multilingual Tasks with Large Language Models [79.46179534911019]
Large language models (LLMs) have demonstrated multilingual capabilities; yet, they are mostly English-centric due to imbalanced training corpora.
This work extends the evaluation from NLP tasks to real user queries.
For culture-related tasks that need deep language understanding, prompting in the native language tends to be more promising.
arXiv Detail & Related papers (2024-03-15T12:47:39Z) - Decomposed Prompting: Unveiling Multilingual Linguistic Structure
Knowledge in English-Centric Large Language Models [12.700783525558721]
English-centric Large Language Models (LLMs) like GPT-3 and LLaMA display a remarkable ability to perform multilingual tasks.
This paper introduces the decomposed prompting approach to probe the linguistic structure understanding of these LLMs in sequence labeling tasks.
arXiv Detail & Related papers (2024-02-28T15:15:39Z) - Could We Have Had Better Multilingual LLMs If English Was Not the Central Language? [4.655168524016426]
Large Language Models (LLMs) demonstrate strong machine translation capabilities on languages they are trained on.
Our study delves into Llama2's translation capabilities.
Our experiments show that the 7B Llama2 model yields above 10 BLEU when translating into all languages it has seen.
arXiv Detail & Related papers (2024-02-21T16:32:38Z) - Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is
It and How Does It Affect Transfer? [50.48082721476612]
Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability.
We investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages.
arXiv Detail & Related papers (2022-12-21T09:44:08Z) - Does Transliteration Help Multilingual Language Modeling? [0.0]
We empirically measure the effect of transliteration on Multilingual Language Models.
We focus on the Indic languages, which have the highest script diversity in the world.
We find that transliteration benefits the low-resource languages without negatively affecting the comparatively high-resource languages.
arXiv Detail & Related papers (2022-01-29T05:48:42Z) - How do lexical semantics affect translation? An empirical study [1.0152838128195467]
A distinguishing factor of natural language is that words are typically ordered according to the rules of the grammar of a given language.
We investigate how the word ordering of and lexical similarity between the source and target language affect translation performance.
arXiv Detail & Related papers (2021-12-31T23:28:28Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.