A Cloud-based Machine Learning Pipeline for the Efficient Extraction of
Insights from Customer Reviews
- URL: http://arxiv.org/abs/2306.07786v2
- Date: Sun, 18 Jun 2023 10:56:14 GMT
- Title: A Cloud-based Machine Learning Pipeline for the Efficient Extraction of
Insights from Customer Reviews
- Authors: Robert Lakatos, Gergo Bogacsovics, Balazs Harangi, Istvan Lakatos,
Attila Tiba, Janos Toth, Marianna Szabo, Andras Hajdu
- Abstract summary: We present a cloud-based system that can extract insights from customer reviews using machine learning methods integrated into a pipeline.
For topic modeling, our composite model uses transformer-based neural networks designed for natural language processing.
Our system can achieve better results than this task's existing topic modeling and keyword extraction solutions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The efficiency of natural language processing has improved dramatically with
the advent of machine learning models, particularly neural network-based
solutions. However, some tasks are still challenging, especially when
considering specific domains. In this paper, we present a cloud-based system
that can extract insights from customer reviews using machine learning methods
integrated into a pipeline. For topic modeling, our composite model uses
transformer-based neural networks designed for natural language processing,
vector embedding-based keyword extraction, and clustering. The elements of our
model have been integrated and further developed to meet better the
requirements of efficient information extraction, topic modeling of the
extracted information, and user needs. Furthermore, our system can achieve
better results than this task's existing topic modeling and keyword extraction
solutions. Our approach is validated and compared with other state-of-the-art
methods using publicly available datasets for benchmarking.
Related papers
- Leveraging Large Language Models for Mobile App Review Feature Extraction [4.879919005707447]
This study explores the hypothesis that encoder-only large language models can enhance feature extraction from mobile app reviews.
By leveraging crowdsourced annotations from an industrial context, we redefine feature extraction as a supervised token classification task.
Empirical evaluations demonstrate that this method improves the precision and recall of extracted features and enhances performance efficiency.
arXiv Detail & Related papers (2024-08-02T07:31:57Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - An Empirical Investigation of Commonsense Self-Supervision with
Knowledge Graphs [67.23285413610243]
Self-supervision based on the information extracted from large knowledge graphs has been shown to improve the generalization of language models.
We study the effect of knowledge sampling strategies and sizes that can be used to generate synthetic data for adapting language models.
arXiv Detail & Related papers (2022-05-21T19:49:04Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Efficacy of Bayesian Neural Networks in Active Learning [11.609770399591516]
We show that Bayesian neural networks are more efficient than ensemble based techniques in capturing uncertainty.
Our findings also reveal some key drawbacks of the ensemble techniques, which was recently shown to be more effective than Monte Carlo dropouts.
arXiv Detail & Related papers (2021-04-02T06:02:11Z) - Learning Purified Feature Representations from Task-irrelevant Labels [18.967445416679624]
We propose a novel learning framework called PurifiedLearning to exploit task-irrelevant features extracted from task-irrelevant labels.
Our work is built on solid theoretical analysis and extensive experiments, which demonstrate the effectiveness of PurifiedLearning.
arXiv Detail & Related papers (2021-02-22T12:50:49Z) - Model-Based Deep Learning [155.063817656602]
Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques.
Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance.
We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches.
arXiv Detail & Related papers (2020-12-15T16:29:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.