Cultural Adaptation of Menus: A Fine-Grained Approach
- URL: http://arxiv.org/abs/2408.13534v1
- Date: Sat, 24 Aug 2024 09:25:18 GMT
- Title: Cultural Adaptation of Menus: A Fine-Grained Approach
- Authors: Zhonghe Zhang, Xiaoyu He, Vivek Iyer, Alexandra Birch,
- Abstract summary: Machine Translation of Culture-Specific Items (CSIs) poses significant challenges.
Recent work on CSI translation has shown some success using Large Language Models (LLMs) to adapt to different languages and cultures.
We introduce the ChineseMenuCSI dataset, the largest for Chinese-English menu corpora, annotated with CSI vs Non-CSI labels.
We develop a novel methodology for automatic CSI identification, which outperforms GPT-based prompts in most categories.
- Score: 58.08115795037042
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine Translation of Culture-Specific Items (CSIs) poses significant challenges. Recent work on CSI translation has shown some success using Large Language Models (LLMs) to adapt to different languages and cultures; however, a deeper analysis is needed to examine the benefits and pitfalls of each method. In this paper, we introduce the ChineseMenuCSI dataset, the largest for Chinese-English menu corpora, annotated with CSI vs Non-CSI labels and a fine-grained test set. We define three levels of CSI figurativeness for a more nuanced analysis and develop a novel methodology for automatic CSI identification, which outperforms GPT-based prompts in most categories. Importantly, we are the first to integrate human translation theories into LLM-driven translation processes, significantly improving translation accuracy, with COMET scores increasing by up to 7 points.
Related papers
- Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs [18.84670051328337]
XC-Translate is the first large-scale, manually-created benchmark for machine translation.
KG-MT is a novel end-to-end method to integrate information from a multilingual knowledge graph into a neural machine translation model.
arXiv Detail & Related papers (2024-10-17T21:56:22Z) - A Novel Paradigm Boosting Translation Capabilities of Large Language Models [11.537249547487045]
The paper proposes a novel paradigm consisting of three stages: Secondary Pre-training using Extensive Monolingual Data, Continual Pre-training with Interlinear Text Format Documents, and Leveraging Source-Language Consistent Instruction for Supervised Fine-Tuning.
Experimental results conducted using the Llama2 model, particularly on Chinese-Llama2, demonstrate the improved translation capabilities of LLMs.
arXiv Detail & Related papers (2024-03-18T02:53:49Z) - Analyzing and Adapting Large Language Models for Few-Shot Multilingual
NLU: Are We There Yet? [82.02076369811402]
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning.
We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups.
Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
arXiv Detail & Related papers (2024-03-04T10:48:13Z) - Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach [1.6982207802596105]
This study investigates three key questions: (1) the distinguishability of ChatGPT-generated translations from NMT and human translation (HT), (2) the linguistic characteristics of each translation type, and (3) the degree of resemblance between ChatGPT-produced translations and HT or NMT.
arXiv Detail & Related papers (2023-12-17T15:56:05Z) - Benchmarking Machine Translation with Cultural Awareness [50.183458829028226]
Translating culture-related content is vital for effective cross-cultural communication.
Many culture-specific items (CSIs) often lack viable translations across languages.
This difficulty hinders the analysis of cultural awareness of machine translation systems.
arXiv Detail & Related papers (2023-05-23T17:56:33Z) - Soft Prompt Decoding for Multilingual Dense Retrieval [30.766917713997355]
We show that applying state-of-the-art approaches developed for cross-lingual information retrieval to MLIR tasks leads to sub-optimal performance.
This is due to the heterogeneous and imbalanced nature of multilingual collections.
We present KD-SPD, a novel soft prompt decoding approach for MLIR that implicitly "translates" the representation of documents in different languages into the same embedding space.
arXiv Detail & Related papers (2023-05-15T21:17:17Z) - Understanding Translationese in Cross-Lingual Summarization [106.69566000567598]
Cross-lingual summarization (MS) aims at generating a concise summary in a different target language.
To collect large-scale CLS data, existing datasets typically involve translation in their creation.
In this paper, we first confirm that different approaches of constructing CLS datasets will lead to different degrees of translationese.
arXiv Detail & Related papers (2022-12-14T13:41:49Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual
Retrieval [51.60862829942932]
We present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
For sentence-level CLIR, we demonstrate that state-of-the-art performance can be achieved.
However, the peak performance is not met using the general-purpose multilingual text encoders off-the-shelf', but rather relying on their variants that have been further specialized for sentence understanding tasks.
arXiv Detail & Related papers (2021-01-21T00:15:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.