Pragya: An AI-Based Semantic Recommendation System for Sanskrit Subhasitas
- URL: http://arxiv.org/abs/2601.06607v1
- Date: Sat, 10 Jan 2026 16:13:25 GMT
- Title: Pragya: An AI-Based Semantic Recommendation System for Sanskrit Subhasitas
- Authors: Tanisha Raorane, Prasenjit Kole,
- Abstract summary: We present Pragya, a retrieval-augmented generation framework for semantic recommendation of Subhasitas.<n>We curate a dataset of 200 verses annotated with thematic tags such as motivation, friendship, and compassion.<n>Using sentence embeddings (IndicBERT), the system retrieves top-k verses relevant to user queries.<n>The retrieved results are then passed to a generative model to produce transliterations, translations, and contextual explanations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Sanskrit Subhasitas encapsulate centuries of cultural and philosophical wisdom, yet remain underutilized in the digital age due to linguistic and contextual barriers. In this work, we present Pragya, a retrieval-augmented generation (RAG) framework for semantic recommendation of Subhasitas. We curate a dataset of 200 verses annotated with thematic tags such as motivation, friendship, and compassion. Using sentence embeddings (IndicBERT), the system retrieves top-k verses relevant to user queries. The retrieved results are then passed to a generative model (Mistral LLM) to produce transliterations, translations, and contextual explanations. Experimental evaluation demonstrates that semantic retrieval significantly outperforms keyword matching in precision and relevance, while user studies highlight improved accessibility through generated summaries. To our knowledge, this is the first attempt at integrating retrieval and generation for Sanskrit Subhasitas, bridging cultural heritage with modern applied AI.
Related papers
- On the Merits of LLM-Based Corpus Enrichment [11.398498369228571]
We argue for a novel perspective: using genAI to enrich a document corpus.<n>The enrichment is based on modifying existing documents or generating new ones.
arXiv Detail & Related papers (2025-06-06T12:02:14Z) - Anveshana: A New Benchmark Dataset for Cross-Lingual Information Retrieval On English Queries and Sanskrit Documents [7.967320126793103]
The study fine-tunes state-of-the-art models for Sanskrit's linguistic nuances.<n>It adapts summarization techniques for Sanskrit documents to improve QA processing.<n>A dataset of 3,400 English-Sanskrit query-document pairs underpins the study.
arXiv Detail & Related papers (2025-05-26T04:23:21Z) - LEVOS: Leveraging Vocabulary Overlap with Sanskrit to Generate Technical Lexicons in Indian Languages [39.08623113730563]
We propose a novel use-case of Sanskrit-based segments for linguistically informed translation of technical terms.<n>Our approach uses character-level segmentation to identify meaningful subword units.<n>We observe consistent improvements in two experimental settings for technical term translation using Sanskrit-derived segments.
arXiv Detail & Related papers (2024-07-08T18:50:13Z) - A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching [60.51839859852572]
We propose to resolve the text into multi concepts for multilingual semantic matching to liberate the model from the reliance on NER models.
We conduct comprehensive experiments on English datasets QQP and MRPC, and Chinese dataset Medical-SM.
arXiv Detail & Related papers (2024-03-05T13:55:16Z) - Unifying Latent and Lexicon Representations for Effective Video-Text
Retrieval [87.69394953339238]
We propose the UNIFY framework, which learns lexicon representations to capture fine-grained semantics in video-text retrieval.
We show our framework largely outperforms previous video-text retrieval methods, with 4.8% and 8.2% Recall@1 improvement on MSR-VTT and DiDeMo respectively.
arXiv Detail & Related papers (2024-02-26T17:36:50Z) - Language Models As Semantic Indexers [78.83425357657026]
We introduce LMIndexer, a self-supervised framework to learn semantic IDs with a generative language model.
We show the high quality of the learned IDs and demonstrate their effectiveness on three tasks including recommendation, product search, and document retrieval.
arXiv Detail & Related papers (2023-10-11T18:56:15Z) - HICL: Hashtag-Driven In-Context Learning for Social Media Natural
Language Understanding [15.743523533234224]
In this paper, we propose a novel hashtag-driven in-context learning framework for natural language understanding on social media.
Our objective is to enable a model #Encoder to incorporate topic-related semantic information, which allows it to retrieve topic-related posts.
For empirical studies, we collected 45M tweets to set up an in-context NLU benchmark, and the experimental results on seven downstream tasks show that HICL substantially advances the previous state-of-the-art results.
arXiv Detail & Related papers (2023-08-19T11:31:45Z) - Syntax and Semantics Meet in the "Middle": Probing the Syntax-Semantics
Interface of LMs Through Agentivity [68.8204255655161]
We present the semantic notion of agentivity as a case study for probing such interactions.
This suggests LMs may potentially serve as more useful tools for linguistic annotation, theory testing, and discovery.
arXiv Detail & Related papers (2023-05-29T16:24:01Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Transition-based Abstract Meaning Representation Parsing with Contextual
Embeddings [0.0]
We study a way of combing two of the most successful routes to meaning of language--statistical language models and symbolic semantics formalisms--in the task of semantic parsing.
We explore the utility of incorporating pretrained context-aware word embeddings--such as BERT and RoBERTa--in the problem of parsing.
arXiv Detail & Related papers (2022-06-13T15:05:24Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.