Semantic and sentiment analysis of selected Bhagavad Gita translations
using BERT-based language framework
- URL: http://arxiv.org/abs/2201.03115v1
- Date: Sun, 9 Jan 2022 23:59:11 GMT
- Title: Semantic and sentiment analysis of selected Bhagavad Gita translations
using BERT-based language framework
- Authors: Rohitash Chandra, Venkatesh Kulkarni
- Abstract summary: The Bhagavad Gita is an ancient Hindu philosophical text originally written in Sanskrit that features a conversation between Lord Krishna and Arjuna prior to the Mahabharata war.
In this paper, we compare selected translations (mostly from Sanskrit to English) of the Bhagavad Gita using semantic and sentiment analyses.
- Score: 0.4125187280299248
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is well known that translations of songs and poems not only breaks rhythm
and rhyming patterns, but also results in loss of semantic information. The
Bhagavad Gita is an ancient Hindu philosophical text originally written in
Sanskrit that features a conversation between Lord Krishna and Arjuna prior to
the Mahabharata war. The Bhagavad Gita is also one of the key sacred texts in
Hinduism and known as the forefront of the Vedic corpus of Hinduism. In the
last two centuries, there has been a lot of interest in Hindu philosophy by
western scholars and hence the Bhagavad Gita has been translated in a number of
languages. However, there is not much work that validates the quality of the
English translations. Recent progress of language models powered by deep
learning has enabled not only translations but better understanding of language
and texts with semantic and sentiment analysis. Our work is motivated by the
recent progress of language models powered by deep learning methods. In this
paper, we compare selected translations (mostly from Sanskrit to English) of
the Bhagavad Gita using semantic and sentiment analyses. We use hand-labelled
sentiment dataset for tuning state-of-art deep learning-based language model
known as \textit{bidirectional encoder representations from transformers}
(BERT). We use novel sentence embedding models to provide semantic analysis for
selected chapters and verses across translations. Finally, we use the
aforementioned models for sentiment and semantic analyses and provide
visualisation of results. Our results show that although the style and
vocabulary in the respective Bhagavad Gita translations vary widely, the
sentiment analysis and semantic similarity shows that the message conveyed are
mostly similar across the translations.
Related papers
- The First Swahili Language Scene Text Detection and Recognition Dataset [55.83178123785643]
There is a significant gap in low-resource languages, especially the Swahili Language.
Swahili is widely spoken in East African countries but is still an under-explored language in scene text recognition.
We propose a comprehensive dataset of Swahili scene text images and evaluate the dataset on different scene text detection and recognition models.
arXiv Detail & Related papers (2024-05-19T03:55:02Z) - Understanding Cross-Lingual Alignment -- A Survey [52.572071017877704]
Cross-lingual alignment is the meaningful similarity of representations across languages in multilingual language models.
We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field.
arXiv Detail & Related papers (2024-04-09T11:39:53Z) - Large language model for Bible sentiment analysis: Sermon on the Mount [1.8804426519412474]
We use sentiment analysis for studying selected chapters of the Bible.
These chapters are known as the Sermon on the Mount.
We detect different levels of humour, optimism, and empathy in the respective chapters that were used by Jesus to deliver his message.
arXiv Detail & Related papers (2024-01-01T07:35:29Z) - An evaluation of Google Translate for Sanskrit to English translation
via sentiment and semantic analysis [0.31317409221921144]
In 2022, the Sanskrit language was added to the Google Translate engine.
In this study, we present a framework that evaluates the Google Translate for Sanskrit using the Bhagavad Gita.
arXiv Detail & Related papers (2023-02-28T04:24:55Z) - CLSE: Corpus of Linguistically Significant Entities [58.29901964387952]
We release a Corpus of Linguistically Significant Entities (CLSE) annotated by experts.
CLSE covers 74 different semantic types to support various applications from airline ticketing to video games.
We create a linguistically representative NLG evaluation benchmark in three languages: French, Marathi, and Russian.
arXiv Detail & Related papers (2022-11-04T12:56:12Z) - Artificial intelligence for topic modelling in Hindu philosophy: mapping
themes between the Upanishads and the Bhagavad Gita [0.4125187280299248]
We use advanced language produces such as BERT to provide topic modelling of the key texts of the Upanishads and the Bhagavad Gita.
Our results show a very high similarity between the topics of these two texts with the mean cosine similarity of 73%.
Our best performing model gives a coherence score of 73% on the Bhagavad Gita and 69% on The Upanishads.
arXiv Detail & Related papers (2022-05-23T03:39:00Z) - Translating Hanja Historical Documents to Contemporary Korean and
English [52.625998002213585]
Annals of Joseon Dynasty contain the daily records of the Kings of Joseon, the 500-year kingdom preceding the modern nation of Korea.
The Annals were originally written in an archaic Korean writing system, Hanja', and were translated into Korean from 1968 to 1993.
Since then, the records of only one king have been completed in a decade.
We propose H2KE, a neural machine translation model, that translates historical documents in Hanja to more easily understandable Korean and to English.
arXiv Detail & Related papers (2022-05-20T08:25:11Z) - Extract, Integrate, Compete: Towards Verification Style Reading
Comprehension [66.2551168928688]
We present a new verification style reading comprehension dataset named VGaokao from Chinese Language tests of Gaokao.
To address the challenges in VGaokao, we propose a novel Extract-Integrate-Compete approach.
arXiv Detail & Related papers (2021-09-11T01:34:59Z) - Itihasa: A large-scale corpus for Sanskrit to English translation [9.566221218224637]
Itihasa is a large-scale translation dataset containing 93,000 pairs of Sanskrit shlokas and their English translations.
We first describe the motivation behind the curation of such a dataset and follow up with empirical analysis to bring out its nuances.
arXiv Detail & Related papers (2021-06-06T22:58:13Z) - It's not Greek to mBERT: Inducing Word-Level Translations from
Multilingual BERT [54.84185432755821]
multilingual BERT (mBERT) learns rich cross-lingual representations, that allow for transfer across languages.
We study the word-level translation information embedded in mBERT and present two simple methods that expose remarkable translation capabilities with no fine-tuning.
arXiv Detail & Related papers (2020-10-16T09:49:32Z) - Anubhuti -- An annotated dataset for emotional analysis of Bengali short
stories [2.3424047967193826]
Anubhuti is the first and largest text corpus for analyzing emotions expressed by writers of Bengali short stories.
We explain the data collection methods, the manual annotation process and the resulting high inter-annotator agreement.
We have verified the performance of our dataset with baseline Machine Learning and a Deep Learning model for emotion classification.
arXiv Detail & Related papers (2020-10-06T22:33:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.