An evaluation of Google Translate for Sanskrit to English translation
via sentiment and semantic analysis
- URL: http://arxiv.org/abs/2303.07201v1
- Date: Tue, 28 Feb 2023 04:24:55 GMT
- Title: An evaluation of Google Translate for Sanskrit to English translation
via sentiment and semantic analysis
- Authors: Akshat Shukla, Chaarvi Bansal, Sushrut Badhe, Mukul Ranjan, Rohitash
Chandra
- Abstract summary: In 2022, the Sanskrit language was added to the Google Translate engine.
In this study, we present a framework that evaluates the Google Translate for Sanskrit using the Bhagavad Gita.
- Score: 0.31317409221921144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Google Translate has been prominent for language translation; however,
limited work has been done in evaluating the quality of translation when
compared to human experts. Sanskrit one of the oldest written languages in the
world. In 2022, the Sanskrit language was added to the Google Translate engine.
Sanskrit is known as the mother of languages such as Hindi and an ancient
source of the Indo-European group of languages. Sanskrit is the original
language for sacred Hindu texts such as the Bhagavad Gita. In this study, we
present a framework that evaluates the Google Translate for Sanskrit using the
Bhagavad Gita. We first publish a translation of the Bhagavad Gita in Sanskrit
using Google Translate. Our framework then compares Google Translate version of
Bhagavad Gita with expert translations using sentiment and semantic analysis
via BERT-based language models. Our results indicate that in terms of sentiment
and semantic analysis, there is low level of similarity in selected verses of
Google Translate when compared to expert translations. In the qualitative
evaluation, we find that Google translate is unsuitable for translation of
certain Sanskrit words and phrases due to its poetic nature, contextual
significance, metaphor and imagery. The mistranslations are not surprising
since the Bhagavad Gita is known as a difficult text not only to translate, but
also to interpret since it relies on contextual, philosophical and historical
information. Our framework lays the foundation for automatic evaluation of
other languages by Google Translate
Related papers
- Evaluation of Google Translate for Mandarin Chinese translation using sentiment and semantic analysis [1.3999481573773074]
Machine translation using large language models (LLMs) is having a significant global impact.
Mandarin Chinese is the official language used for communication by the government and media in China.
In this study, we provide an automated assessment of translation quality of Google Translate with human experts using sentiment and semantic analysis.
arXiv Detail & Related papers (2024-09-08T04:03:55Z) - Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - Multilingual Tourist Assistance using ChatGPT: Comparing Capabilities in
Hindi, Telugu, and Kannada [1.5762281194023464]
This research investigates the effectiveness of ChatGPT, an AI language model by OpenAI, in translating English into Hindi, Telugu, and Kannada languages.
To measure the translation quality, a test set of 50 questions from diverse fields such as general knowledge, food, and travel was used.
Human evaluators rated both the accuracy and fluency of translations, offering a comprehensive perspective on the language model's performance.
arXiv Detail & Related papers (2023-07-28T07:52:26Z) - SÄmayik: A Benchmark and Dataset for English-Sanskrit Translation [30.315293326789828]
S=amayik is a dataset of around 53,000 parallel English-Sanskrit sentences, written in contemporary prose.
S=amayik is curated from a diverse range of domains, including language instruction material, textual teaching pedagogy, and online tutorials.
arXiv Detail & Related papers (2023-05-23T12:32:24Z) - Is ChatGPT A Good Translator? Yes With GPT-4 As The Engine [97.8609714773255]
We evaluate ChatGPT for machine translation, including translation prompt, multilingual translation, and translation robustness.
ChatGPT performs competitively with commercial translation products but lags behind significantly on low-resource or distant languages.
With the launch of the GPT-4 engine, the translation performance of ChatGPT is significantly boosted.
arXiv Detail & Related papers (2023-01-20T08:51:36Z) - Translating Hanja Historical Documents to Contemporary Korean and
English [52.625998002213585]
Annals of Joseon Dynasty contain the daily records of the Kings of Joseon, the 500-year kingdom preceding the modern nation of Korea.
The Annals were originally written in an archaic Korean writing system, Hanja', and were translated into Korean from 1968 to 1993.
Since then, the records of only one king have been completed in a decade.
We propose H2KE, a neural machine translation model, that translates historical documents in Hanja to more easily understandable Korean and to English.
arXiv Detail & Related papers (2022-05-20T08:25:11Z) - Semantic and sentiment analysis of selected Bhagavad Gita translations
using BERT-based language framework [0.4125187280299248]
The Bhagavad Gita is an ancient Hindu philosophical text originally written in Sanskrit that features a conversation between Lord Krishna and Arjuna prior to the Mahabharata war.
In this paper, we compare selected translations (mostly from Sanskrit to English) of the Bhagavad Gita using semantic and sentiment analyses.
arXiv Detail & Related papers (2022-01-09T23:59:11Z) - Challenge Dataset of Cognates and False Friend Pairs from Indian
Languages [54.6340870873525]
Cognates are present in multiple variants of the same text across different languages.
In this paper, we describe the creation of two cognate datasets for twelve Indian languages.
arXiv Detail & Related papers (2021-12-17T14:23:43Z) - Harnessing Cross-lingual Features to Improve Cognate Detection for
Low-resource Languages [50.82410844837726]
We demonstrate the use of cross-lingual word embeddings for detecting cognates among fourteen Indian languages.
We evaluate our methods to detect cognates on a challenging dataset of twelve Indian languages.
We observe an improvement of up to 18% points, in terms of F-score, for cognate detection.
arXiv Detail & Related papers (2021-12-16T11:17:58Z) - Itihasa: A large-scale corpus for Sanskrit to English translation [9.566221218224637]
Itihasa is a large-scale translation dataset containing 93,000 pairs of Sanskrit shlokas and their English translations.
We first describe the motivation behind the curation of such a dataset and follow up with empirical analysis to bring out its nuances.
arXiv Detail & Related papers (2021-06-06T22:58:13Z) - A Multilingual Neural Machine Translation Model for Biomedical Data [84.17747489525794]
We release a multilingual neural machine translation model, which can be used to translate text in the biomedical domain.
The model can translate from 5 languages (French, German, Italian, Korean and Spanish) into English.
It is trained with large amounts of generic and biomedical data, using domain tags.
arXiv Detail & Related papers (2020-08-06T21:26:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.