Survey of Pseudonymization, Abstractive Summarization & Spell Checker for Hindi and Marathi
- URL: http://arxiv.org/abs/2412.18163v1
- Date: Tue, 24 Dec 2024 04:51:32 GMT
- Title: Survey of Pseudonymization, Abstractive Summarization & Spell Checker for Hindi and Marathi
- Authors: Rasika Ransing, Mohammed Amaan Dhamaskar, Ayush Rajpurohit, Amey Dhoke, Sanket Dalvi,
- Abstract summary: The paper aims to build a platform which enables the user to use various features like text anonymization, abstractive text summarization and spell checking in English, Hindi and Marathi language.
The aim of these tools is to serve enterprise and consumer clients who predominantly use Indian Regional languages.
- Score: 0.0
- License:
- Abstract: India's vast linguistic diversity presents unique challenges and opportunities for technological advancement, especially in the realm of Natural Language Processing (NLP). While there has been significant progress in NLP applications for widely spoken languages, the regional languages of India, such as Marathi and Hindi, remain underserved. Research in the field of NLP for Indian regional languages is at a formative stage and holds immense significance. The paper aims to build a platform which enables the user to use various features like text anonymization, abstractive text summarization and spell checking in English, Hindi and Marathi language. The aim of these tools is to serve enterprise and consumer clients who predominantly use Indian Regional Languages.
Related papers
- A Review of the Marathi Natural Language Processing [0.0]
This paper presents a broad overview of evolution of NLP research in Indic languages.
It focuses on Marathi and state-of-the-art resources and tools available to the research community.
arXiv Detail & Related papers (2024-12-20T00:56:13Z) - Navigating Text-to-Image Generative Bias across Indic Languages [53.92640848303192]
This research investigates biases in text-to-image (TTI) models for the Indic languages widely spoken across India.
It evaluates and compares the generative performance and cultural relevance of leading TTI models in these languages against their performance in English.
arXiv Detail & Related papers (2024-08-01T04:56:13Z) - Decoding the Diversity: A Review of the Indic AI Research Landscape [0.7864304771129751]
Indic languages are those spoken in the Indian subcontinent, including India, Pakistan, Bangladesh, Sri Lanka, Nepal, and Bhutan.
This review paper provides a comprehensive overview of large language model (LLM) research directions within Indic languages.
arXiv Detail & Related papers (2024-06-13T19:55:20Z) - Fine-tuning Pre-trained Named Entity Recognition Models For Indian Languages [6.7638050195383075]
We analyze the challenges and propose techniques that can be tailored for Multilingual Named Entity Recognition for Indian languages.
We present a human annotated named entity corpora of 40K sentences for 4 Indian languages from two of the major Indian language families.
We achieve comparable performance on completely unseen benchmark datasets for Indian languages which affirms the usability of our model.
arXiv Detail & Related papers (2024-05-08T05:54:54Z) - What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects [60.8361859783634]
We survey speakers of dialects and regional languages related to German.
We find that respondents are especially in favour of potential NLP tools that work with dialectal input.
arXiv Detail & Related papers (2024-02-19T09:15:28Z) - NusaWrites: Constructing High-Quality Corpora for Underrepresented and
Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages.
We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.
Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z) - An Overview of Indian Spoken Language Recognition from Machine Learning
Perspective [7.27448284043116]
This work is one of the first attempts to present a comprehensive review of the Indian spoken language recognition research field.
In-depth analysis has been presented to emphasize the unique challenges of low-resource and mutual influences for developing LID systems in the Indian contexts.
arXiv Detail & Related papers (2022-11-30T11:03:51Z) - NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local
Languages [100.59889279607432]
We focus on developing resources for languages in Indonesia.
Most languages in Indonesia are categorized as endangered and some are even extinct.
We develop the first-ever parallel resource for 10 low-resource languages in Indonesia.
arXiv Detail & Related papers (2022-05-31T17:03:50Z) - One Country, 700+ Languages: NLP Challenges for Underrepresented
Languages and Dialects in Indonesia [60.87739250251769]
We provide an overview of the current state of NLP research for Indonesia's 700+ languages.
We highlight challenges in Indonesian NLP and how these affect the performance of current NLP systems.
arXiv Detail & Related papers (2022-03-24T22:07:22Z) - Including Signed Languages in Natural Language Processing [48.62744923724317]
Signed languages are the primary means of communication for many deaf and hard of hearing individuals.
This position paper calls on the NLP community to include signed languages as a research area with high social and scientific impact.
arXiv Detail & Related papers (2021-05-11T17:37:55Z) - Taxonomic survey of Hindi Language NLP systems [0.0]
Natural Language processing (NLP) represents the task of automatic handling of natural human language by machines.
This survey gives a report of the resources and applications available for Hindi language NLP.
arXiv Detail & Related papers (2021-01-30T11:53:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.