Related papers: Deep Learning for Hindi Text Classification: A Comparison

Deep Learning for Hindi Text Classification: A Comparison

URL: http://arxiv.org/abs/2001.10340v1
Date: Sun, 19 Jan 2020 09:29:12 GMT
Title: Deep Learning for Hindi Text Classification: A Comparison
Authors: Ramchandra Joshi, Purvi Goel, Raviraj Joshi
Abstract summary: The research in the classification of morphologically rich and low resource Hindi language written in Devanagari script has been limited due to the absence of large labeled corpus. In this work, we used translated versions of English data-sets to evaluate models based on CNN, LSTM and Attention. The paper also serves as a tutorial for popular text classification techniques.
Score: 6.8629257716723
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Natural Language Processing (NLP) and especially natural language text analysis have seen great advances in recent times. Usage of deep learning in text processing has revolutionized the techniques for text processing and achieved remarkable results. Different deep learning architectures like CNN, LSTM, and very recent Transformer have been used to achieve state of the art results variety on NLP tasks. In this work, we survey a host of deep learning architectures for text classification tasks. The work is specifically concerned with the classification of Hindi text. The research in the classification of morphologically rich and low resource Hindi language written in Devanagari script has been limited due to the absence of large labeled corpus. In this work, we used translated versions of English data-sets to evaluate models based on CNN, LSTM and Attention. Multilingual pre-trained sentence embeddings based on BERT and LASER are also compared to evaluate their effectiveness for the Hindi language. The paper also serves as a tutorial for popular text classification techniques.

Related papers

The Text Classification Pipeline: Starting Shallow going Deeper [4.97309503788908]
The past decade has seen deep learning revolutionize text classification. English is the predominant language of focus, despite studies involving Arabic, Chinese, Hindi, and others. This work integrates traditional and contemporary text mining methodologies, fostering a holistic understanding of text classification.
arXiv Detail & Related papers (2024-12-30T23:01:19Z)
Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings. An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts) This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z)
Comprehensive Implementation of TextCNN for Enhanced Collaboration between Natural Language Processing and System Recommendation [1.7692743931394748]
This paper analyzes the application status of deep learning in the three core tasks of NLP. It takes into account the challenges posed by adversarial techniques in text generation, text classification, and semantic parsing. An empirical study on text classification tasks demonstrates the effectiveness of interactive integration training.
arXiv Detail & Related papers (2024-03-12T07:25:53Z)
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages. We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets. Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z)
T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest. We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z)
H-AES: Towards Automated Essay Scoring for Hindi [33.755800922763946]
We reproduce and compare state-of-the-art methods for Automated Essay Scoring (AES) in the Hindi domain. We employ classical feature-based Machine Learning (ML) and advanced end-to-end models, including LSTM Networks and Fine-Tuned Transformer Architecture. We train and evaluate our models using translated English essays and empirically measure their performance on our own small-scale, real-world Hindi corpus.
arXiv Detail & Related papers (2023-02-28T15:14:15Z)
To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP [0.0]
We investigate three categories of text augmentation methodologies which perform changes on the syntax. We compare them on part-of-speech tagging, dependency parsing and semantic role labeling for a diverse set of language families. Our results suggest that the augmentation techniques can further improve over strong baselines based on mBERT.
arXiv Detail & Related papers (2021-11-18T10:52:48Z)
Sentiment analysis in tweets: an assessment study from classical to modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information. Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks. This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z)
HinFlair: pre-trained contextual string embeddings for pos tagging and text classification in the Hindi language [0.0]
HinFlair is a language representation model (contextual string embeddings) pre-trained on a large monolingual Hindi corpus. Results show that HinFlair outperforms previous state-of-the-art publicly available pre-trained embeddings for downstream tasks like text classification and pos tagging.
arXiv Detail & Related papers (2021-01-18T09:23:35Z)
Bangla Text Classification using Transformers [2.3475904942266697]
Text classification has been one of the earliest problems in NLP. In this work, we fine-tune multilingual Transformer models for Bangla text classification tasks. We obtain the state of the art results on six benchmark datasets, improving upon the previous results by 5-29% accuracy across different tasks.
arXiv Detail & Related papers (2020-11-09T14:12:07Z)
Deep Learning for Text Style Transfer: A Survey [71.8870854396927]
Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text. We present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017. We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data.
arXiv Detail & Related papers (2020-11-01T04:04:43Z)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP) In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.