Product Market Demand Analysis Using NLP in Banglish Text with Sentiment
Analysis and Named Entity Recognition
- URL: http://arxiv.org/abs/2204.01827v1
- Date: Mon, 4 Apr 2022 20:21:31 GMT
- Title: Product Market Demand Analysis Using NLP in Banglish Text with Sentiment
Analysis and Named Entity Recognition
- Authors: Md Sabbir Hossain, Nishat Nayla, Annajiat Alim Rasel
- Abstract summary: There are roughly 228 million native Bengali speakers.
Consumers are buying and evaluating items on social media with Banglish text.
People use social media to find preferred smartphone brands and models.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Product market demand analysis plays a significant role for originating
business strategies due to its noticeable impact on the competitive business
field. Furthermore, there are roughly 228 million native Bengali speakers, the
majority of whom use Banglish text to interact with one another on social
media. Consumers are buying and evaluating items on social media with Banglish
text as social media emerges as an online marketplace for entrepreneurs. People
use social media to find preferred smartphone brands and models by sharing
their positive and bad experiences with them. For this reason, our goal is to
gather Banglish text data and use sentiment analysis and named entity
identification to assess Bangladeshi market demand for smartphones in order to
determine the most popular smartphones by gender. We scraped product related
data from social media with instant data scrapers and crawled data from
Wikipedia and other sites for product information with python web scrapers.
Using Python's Pandas and Seaborn libraries, the raw data is filtered using NLP
methods. To train our datasets for named entity recognition, we utilized
Spacey's custom NER model, Amazon Comprehend Custom NER. A tensorflow
sequential model was deployed with parameter tweaking for sentiment analysis.
Meanwhile, we used the Google Cloud Translation API to estimate the gender of
the reviewers using the BanglaLinga library. In this article, we use natural
language processing (NLP) approaches and several machine learning models to
identify the most in-demand items and services in the Bangladeshi market. Our
model has an accuracy of 87.99% in Spacy Custom Named Entity recognition,
95.51% in Amazon Comprehend Custom NER, and 87.02% in the Sequential model for
demand analysis. After Spacy's study, we were able to manage 80% of mistakes
related to misspelled words using a mix of Levenshtein distance and ratio
algorithms.
Related papers
- A Probability--Quality Trade-off in Aligned Language Models and its Relation to Sampling Adaptors [50.046717886067555]
We show that when sampling corpora from an aligned language model, there exists a trade-off between the strings' average reward and average log-likelihood.
We provide a formal treatment of this phenomenon and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.
arXiv Detail & Related papers (2024-06-14T17:38:21Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - Constructing Colloquial Dataset for Persian Sentiment Analysis of Social
Microblogs [0.0]
This paper first constructs a user opinion dataset called ITRC-Opinion in a collaborative environment and insource way.
Our dataset contains 60,000 informal and colloquial Persian texts from social microblogs such as Twitter and Instagram.
Second, this study proposes a new architecture based on the convolutional neural network (CNN) model for more effective sentiment analysis of colloquial text in social microblog posts.
arXiv Detail & Related papers (2023-06-22T05:51:22Z) - BanglaBook: A Large-scale Bangla Dataset for Sentiment Analysis from
Book Reviews [1.869097450593631]
We present a large-scale dataset of Bangla book reviews consisting of 158,065 samples classified into three broad categories: positive, negative, and neutral.
We employ a range of machine learning models to establish baselines including SVM, LSTM, and Bangla-BERT.
Our findings demonstrate a substantial performance advantage of pre-trained models over models that rely on manually crafted features.
arXiv Detail & Related papers (2023-05-11T06:27:38Z) - Evaluating Embedding APIs for Information Retrieval [51.24236853841468]
We evaluate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval.
We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English.
For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best, albeit at a higher cost.
arXiv Detail & Related papers (2023-05-10T16:40:52Z) - Embedding generation for text classification of Brazilian Portuguese
user reviews: from bag-of-words to transformers [0.0]
This study includes from classical (Bag-of-Words) to state-of-the-art (Transformer-based) NLP models.
It aims to provide a comprehensive experimental study of embedding approaches targeting a binary sentiment classification of user reviews in Brazilian Portuguese.
arXiv Detail & Related papers (2022-12-01T15:24:19Z) - Study of Encoder-Decoder Architectures for Code-Mix Search Query
Translation [0.0]
Many of the queries we receive are code-mix, specifically Hinglish i.e. queries with one or more Hindi words written in English (Latin) script.
We propose a transformer-based approach for code-mix query translation to enable users to search with these queries.
The model is currently live on app and website, serving millions of queries.
arXiv Detail & Related papers (2022-08-07T12:59:50Z) - FBERT: A Neural Transformer for Identifying Offensive Content [67.12838911384024]
fBERT is a BERT model retrained on SOLID, the largest English offensive language identification corpus available with over $1.4$ million offensive instances.
We evaluate fBERT's performance on identifying offensive content on multiple English datasets and we test several thresholds for selecting instances from SOLID.
The fBERT model will be made freely available to the community.
arXiv Detail & Related papers (2021-09-10T19:19:26Z) - Sentiment Classification in Bangla Textual Content: A Comparative Study [4.2394281761764]
In this study, we explore several publicly available sentiment labeled datasets and designed classifiers using both classical and deep learning algorithms.
Our finding suggests transformer-based models, which have not been explored earlier for Bangla, outperform all other models.
arXiv Detail & Related papers (2020-11-19T21:06:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.