Artificial intelligence for topic modelling in Hindu philosophy: mapping
themes between the Upanishads and the Bhagavad Gita
- URL: http://arxiv.org/abs/2205.11020v1
- Date: Mon, 23 May 2022 03:39:00 GMT
- Title: Artificial intelligence for topic modelling in Hindu philosophy: mapping
themes between the Upanishads and the Bhagavad Gita
- Authors: Rohitash Chandra, Mukul Ranjan
- Abstract summary: We use advanced language produces such as BERT to provide topic modelling of the key texts of the Upanishads and the Bhagavad Gita.
Our results show a very high similarity between the topics of these two texts with the mean cosine similarity of 73%.
Our best performing model gives a coherence score of 73% on the Bhagavad Gita and 69% on The Upanishads.
- Score: 0.4125187280299248
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A distinct feature of Hindu religious and philosophical text is that they
come from a library of texts rather than single source. The Upanishads is known
as one of the oldest philosophical texts in the world that forms the foundation
of Hindu philosophy. The Bhagavad Gita is core text of Hindu philosophy and is
known as a text that summarises the key philosophies of the Upanishads with
major focus on the philosophy of karma. These texts have been translated into
many languages and there exists studies about themes and topics that are
prominent; however, there is not much study of topic modelling using language
models which are powered by deep learning. In this paper, we use advanced
language produces such as BERT to provide topic modelling of the key texts of
the Upanishads and the Bhagavad Gita. We analyse the distinct and overlapping
topics amongst the texts and visualise the link of selected texts of the
Upanishads with Bhagavad Gita. Our results show a very high similarity between
the topics of these two texts with the mean cosine similarity of 73%. We find
that out of the fourteen topics extracted from the Bhagavad Gita, nine of them
have a cosine similarity of more than 70% with the topics of the Upanishads. We
also found that topics generated by the BERT-based models show very high
coherence as compared to that of conventional models. Our best performing model
gives a coherence score of 73% on the Bhagavad Gita and 69% on The Upanishads.
The visualization of the low dimensional embeddings of these texts shows very
clear overlapping among their topics adding another level of validation to our
results.
Related papers
- Navigating Text-to-Image Generative Bias across Indic Languages [53.92640848303192]
This research investigates biases in text-to-image (TTI) models for the Indic languages widely spoken across India.
It evaluates and compares the generative performance and cultural relevance of leading TTI models in these languages against their performance in English.
arXiv Detail & Related papers (2024-08-01T04:56:13Z) - Exploring Bengali Religious Dialect Biases in Large Language Models with Evaluation Perspectives [5.648318448953635]
Large Language Models (LLM) can produce output that contains stereotypes and biases.
We explore bias from a religious perspective in Bengali, focusing specifically on two main religious dialects: Hindu and Muslim-majority dialects.
arXiv Detail & Related papers (2024-07-25T20:19:29Z) - CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark [68.21939124278065]
Culturally-diverse multilingual Visual Question Answering benchmark designed to cover a rich set of languages and cultures.
CVQA includes culturally-driven images and questions from across 30 countries on four continents, covering 31 languages with 13 scripts, providing a total of 10k questions.
We benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models.
arXiv Detail & Related papers (2024-06-10T01:59:00Z) - SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 13 Languages [44.017657230247934]
We present textitSemRel, a new semantic relatedness dataset collection annotated by native speakers across 13 languages.
These languages originate from five distinct language families and are predominantly spoken in Africa and Asia.
Each instance in the SemRel datasets is a sentence pair associated with a score that represents the degree of semantic textual relatedness between the two sentences.
arXiv Detail & Related papers (2024-02-13T18:04:53Z) - Mukhyansh: A Headline Generation Dataset for Indic Languages [4.583536403673757]
Mukhyansh is an extensive multilingual dataset, tailored for Indian language headline generation.
Comprising over 3.39 million article-headline pairs, Mukhyansh spans across eight prominent Indian languages.
Mukhyansh outperforms all other models, achieving an average ROUGE-L score of 31.43 across all 8 languages.
arXiv Detail & Related papers (2023-11-29T15:49:24Z) - An evaluation of Google Translate for Sanskrit to English translation
via sentiment and semantic analysis [0.31317409221921144]
In 2022, the Sanskrit language was added to the Google Translate engine.
In this study, we present a framework that evaluates the Google Translate for Sanskrit using the Bhagavad Gita.
arXiv Detail & Related papers (2023-02-28T04:24:55Z) - MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic
Parsing [48.216386761482525]
We present MultiSpider, the largest multilingual text-to- schema- dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese)
Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages.
We also propose a simple framework augmentation framework SAVe (Augmentation-with-Verification) which boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.
arXiv Detail & Related papers (2022-12-27T13:58:30Z) - Semantic and sentiment analysis of selected Bhagavad Gita translations
using BERT-based language framework [0.4125187280299248]
The Bhagavad Gita is an ancient Hindu philosophical text originally written in Sanskrit that features a conversation between Lord Krishna and Arjuna prior to the Mahabharata war.
In this paper, we compare selected translations (mostly from Sanskrit to English) of the Bhagavad Gita using semantic and sentiment analyses.
arXiv Detail & Related papers (2022-01-09T23:59:11Z) - Anubhuti -- An annotated dataset for emotional analysis of Bengali short
stories [2.3424047967193826]
Anubhuti is the first and largest text corpus for analyzing emotions expressed by writers of Bengali short stories.
We explain the data collection methods, the manual annotation process and the resulting high inter-annotator agreement.
We have verified the performance of our dataset with baseline Machine Learning and a Deep Learning model for emotion classification.
arXiv Detail & Related papers (2020-10-06T22:33:58Z) - A Multilingual Neural Machine Translation Model for Biomedical Data [84.17747489525794]
We release a multilingual neural machine translation model, which can be used to translate text in the biomedical domain.
The model can translate from 5 languages (French, German, Italian, Korean and Spanish) into English.
It is trained with large amounts of generic and biomedical data, using domain tags.
arXiv Detail & Related papers (2020-08-06T21:26:43Z) - Generating Major Types of Chinese Classical Poetry in a Uniformed
Framework [88.57587722069239]
We propose a GPT-2 based framework for generating major types of Chinese classical poems.
Preliminary results show this enhanced model can generate Chinese classical poems of major types with high quality in both form and content.
arXiv Detail & Related papers (2020-03-13T14:16:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.