Related papers: Product Market Demand Analysis Using NLP in Banglish Text with Sentiment Analysis and Named Entity Recognition

Product Market Demand Analysis Using NLP in Banglish Text with Sentiment Analysis and Named Entity Recognition

URL: http://arxiv.org/abs/2204.01827v1
Date: Mon, 4 Apr 2022 20:21:31 GMT
Title: Product Market Demand Analysis Using NLP in Banglish Text with Sentiment Analysis and Named Entity Recognition
Authors: Md Sabbir Hossain, Nishat Nayla, Annajiat Alim Rasel
Abstract summary: There are roughly 228 million native Bengali speakers. Consumers are buying and evaluating items on social media with Banglish text. People use social media to find preferred smartphone brands and models.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Product market demand analysis plays a significant role for originating business strategies due to its noticeable impact on the competitive business field. Furthermore, there are roughly 228 million native Bengali speakers, the majority of whom use Banglish text to interact with one another on social media. Consumers are buying and evaluating items on social media with Banglish text as social media emerges as an online marketplace for entrepreneurs. People use social media to find preferred smartphone brands and models by sharing their positive and bad experiences with them. For this reason, our goal is to gather Banglish text data and use sentiment analysis and named entity identification to assess Bangladeshi market demand for smartphones in order to determine the most popular smartphones by gender. We scraped product related data from social media with instant data scrapers and crawled data from Wikipedia and other sites for product information with python web scrapers. Using Python's Pandas and Seaborn libraries, the raw data is filtered using NLP methods. To train our datasets for named entity recognition, we utilized Spacey's custom NER model, Amazon Comprehend Custom NER. A tensorflow sequential model was deployed with parameter tweaking for sentiment analysis. Meanwhile, we used the Google Cloud Translation API to estimate the gender of the reviewers using the BanglaLinga library. In this article, we use natural language processing (NLP) approaches and several machine learning models to identify the most in-demand items and services in the Bangladeshi market. Our model has an accuracy of 87.99% in Spacy Custom Named Entity recognition, 95.51% in Amazon Comprehend Custom NER, and 87.02% in the Sequential model for demand analysis. After Spacy's study, we were able to manage 80% of mistakes related to misspelled words using a mix of Levenshtein distance and ratio algorithms.

Related papers

LLM Based Sentiment Classification From Bangladesh E-Commerce Reviews [0.0]
The viability of using transformer-based BERT models for sentiment analysis from Bangladesh e commerce reviews is investigated in this paper.<n>A subset of 4000 samples from the original dataset of Bangla and English customer reviews was utilized to fine-tune the model.<n>The fine tuned Llama-3.1-8B model outperformed other fine-tuned models, with an overall accuracy, precision, recall, and F1 score of 95.5%, 93%, 88%, 90%.
arXiv Detail & Related papers (2025-09-30T16:46:09Z)
A Probability--Quality Trade-off in Aligned Language Models and its Relation to Sampling Adaptors [50.046717886067555]
We show that when sampling corpora from an aligned language model, there exists a trade-off between the strings' average reward and average log-likelihood. We provide a formal treatment of this phenomenon and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.
arXiv Detail & Related papers (2024-06-14T17:38:21Z)
Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z)
UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset. Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z)
Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases. Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding. This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z)
Constructing Colloquial Dataset for Persian Sentiment Analysis of Social Microblogs [0.0]
This paper first constructs a user opinion dataset called ITRC-Opinion in a collaborative environment and insource way. Our dataset contains 60,000 informal and colloquial Persian texts from social microblogs such as Twitter and Instagram. Second, this study proposes a new architecture based on the convolutional neural network (CNN) model for more effective sentiment analysis of colloquial text in social microblog posts.
arXiv Detail & Related papers (2023-06-22T05:51:22Z)
BanglaBook: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews [1.869097450593631]
We present a large-scale dataset of Bangla book reviews consisting of 158,065 samples classified into three broad categories: positive, negative, and neutral. We employ a range of machine learning models to establish baselines including SVM, LSTM, and Bangla-BERT. Our findings demonstrate a substantial performance advantage of pre-trained models over models that rely on manually crafted features.
arXiv Detail & Related papers (2023-05-11T06:27:38Z)
Evaluating Embedding APIs for Information Retrieval [51.24236853841468]
We evaluate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval. We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English. For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost.
arXiv Detail & Related papers (2023-05-10T16:40:52Z)
Embedding generation for text classification of Brazilian Portuguese user reviews: from bag-of-words to transformers [0.0]
This study includes from classical (Bag-of-Words) to state-of-the-art (Transformer-based) NLP models. It aims to provide a comprehensive experimental study of embedding approaches targeting a binary sentiment classification of user reviews in Brazilian Portuguese.
arXiv Detail & Related papers (2022-12-01T15:24:19Z)
Study of Encoder-Decoder Architectures for Code-Mix Search Query Translation [0.0]
Many of the queries we receive are code-mix, specifically Hinglish i.e. queries with one or more Hindi words written in English (Latin) script. We propose a transformer-based approach for code-mix query translation to enable users to search with these queries. The model is currently live on app and website, serving millions of queries.
arXiv Detail & Related papers (2022-08-07T12:59:50Z)
FBERT: A Neural Transformer for Identifying Offensive Content [67.12838911384024]
fBERT is a BERT model retrained on SOLID, the largest English offensive language identification corpus available with over $1.4$ million offensive instances. We evaluate fBERT's performance on identifying offensive content on multiple English datasets and we test several thresholds for selecting instances from SOLID. The fBERT model will be made freely available to the community.
arXiv Detail & Related papers (2021-09-10T19:19:26Z)
Sentiment analysis in tweets: an assessment study from classical to modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information. Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks. This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z)
Sentiment Classification in Bangla Textual Content: A Comparative Study [4.2394281761764]
In this study, we explore several publicly available sentiment labeled datasets and designed classifiers using both classical and deep learning algorithms. Our finding suggests transformer-based models, which have not been explored earlier for Bangla, outperform all other models.
arXiv Detail & Related papers (2020-11-19T21:06:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.