Hunayn: Elevating Translation Beyond the Literal
- URL: http://arxiv.org/abs/2310.13613v2
- Date: Wed, 25 Oct 2023 17:41:39 GMT
- Title: Hunayn: Elevating Translation Beyond the Literal
- Authors: Nasser Almousa, Nasser Alzamil, Abdullah Alshehri and Ahmad Sait
- Abstract summary: This project introduces an advanced English-to-Arabic translator surpassing conventional tools.
Our approach involves fine-tuning on a self-scraped, purely literary Arabic dataset.
Evaluations against Google Translate show consistent outperformance in qualitative assessments.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This project introduces an advanced English-to-Arabic translator surpassing
conventional tools. Leveraging the Helsinki transformer (MarianMT), our
approach involves fine-tuning on a self-scraped, purely literary Arabic
dataset. Evaluations against Google Translate show consistent outperformance in
qualitative assessments. Notably, it excels in cultural sensitivity and context
accuracy. This research underscores the Helsinki transformer's superiority for
English-to-Arabic translation using a Fusha dataset.
Related papers
- Benchmarking LLMs for Translating Classical Chinese Poetry:Evaluating Adequacy, Fluency, and Elegance [43.148203559785095]
We introduce a suitable benchmark (PoetMT) for translating classical Chinese poetry into English.
This task requires not only adequacy in translating culturally and historically significant content but also a strict adherence to linguistic fluency and poetic elegance.
We propose RAT, a Retrieval-Augmented machine Translation method that enhances the translation process by incorporating knowledge related to classical poetry.
arXiv Detail & Related papers (2024-08-19T12:34:31Z) - Advancing Translation Preference Modeling with RLHF: A Step Towards
Cost-Effective Solution [57.42593422091653]
We explore leveraging reinforcement learning with human feedback to improve translation quality.
A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality.
arXiv Detail & Related papers (2024-02-18T09:51:49Z) - Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - Optimizing Machine Translation through Prompt Engineering: An
Investigation into ChatGPT's Customizability [0.0]
The study reveals that the inclusion of suitable prompts in large-scale language models like ChatGPT can yield flexible translations.
The research scrutinizes the changes in translation quality when prompts are used to generate translations that meet specific conditions.
arXiv Detail & Related papers (2023-08-02T19:11:04Z) - Iterative Translation Refinement with Large Language Models [25.90607157524168]
We propose iteratively prompting a large language model to self-correct a translation.
We also discuss the challenges in evaluation and relation to human performance and translationese.
arXiv Detail & Related papers (2023-06-06T16:51:03Z) - HanoiT: Enhancing Context-aware Translation via Selective Context [95.93730812799798]
Context-aware neural machine translation aims to use the document-level context to improve translation quality.
The irrelevant or trivial words may bring some noise and distract the model from learning the relationship between the current sentence and the auxiliary context.
We propose a novel end-to-end encoder-decoder model with a layer-wise selection mechanism to sift and refine the long document context.
arXiv Detail & Related papers (2023-01-17T12:07:13Z) - The Effect of Normalization for Bi-directional Amharic-English Neural
Machine Translation [53.907805815477126]
This paper presents the first relatively large-scale Amharic-English parallel sentence dataset.
We build bi-directional Amharic-English translation models by fine-tuning the existing Facebook M2M100 pre-trained model.
The results show that the normalization of Amharic homophone characters increases the performance of Amharic-English machine translation in both directions.
arXiv Detail & Related papers (2022-10-27T07:18:53Z) - A Bayesian approach to translators' reliability assessment [0.0]
We consider the Translation Quality Assessment process as a complex process, considering it from the physics of complex systems point of view.
We build two Bayesian models that parameterise the features involved in the TQA process, namely the translation difficulty, the characteristics of the translators involved in producing the translation and assessing its quality.
We show that reviewers reliability cannot be taken for granted even if they are expert translators.
arXiv Detail & Related papers (2022-03-14T14:29:45Z) - Backtranslation Feedback Improves User Confidence in MT, Not Quality [18.282199360280433]
We show three ways in which user confidence in the outbound translation, as well as its overall final quality, can be affected.
In this paper, we describe an experiment on outbound translation from English to Czech and Estonian.
We show that backward translation feedback has a mixed effect on the whole process: it increases user confidence in the produced translation, but not the objective quality.
arXiv Detail & Related papers (2021-04-12T17:50:24Z) - Improving Sentiment Analysis over non-English Tweets using Multilingual
Transformers and Automatic Translation for Data-Augmentation [77.69102711230248]
We propose the use of a multilingual transformer model, that we pre-train over English tweets and apply data-augmentation using automatic translation to adapt the model to non-English languages.
Our experiments in French, Spanish, German and Italian suggest that the proposed technique is an efficient way to improve the results of the transformers over small corpora of tweets in a non-English language.
arXiv Detail & Related papers (2020-10-07T15:44:55Z) - Contextual Neural Machine Translation Improves Translation of Cataphoric
Pronouns [50.245845110446496]
We investigate the effect of future sentences as context by comparing the performance of a contextual NMT model trained with the future context to the one trained with the past context.
Our experiments and evaluation, using generic and pronoun-focused automatic metrics, show that the use of future context achieves significant improvements over the context-agnostic Transformer.
arXiv Detail & Related papers (2020-04-21T10:45:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.