A Topical Approach to Capturing Customer Insight In Social Media
- URL: http://arxiv.org/abs/2307.11775v1
- Date: Fri, 14 Jul 2023 11:15:28 GMT
- Title: A Topical Approach to Capturing Customer Insight In Social Media
- Authors: Miguel Palencia-Olivar
- Abstract summary: This research addresses the challenge of fully unsupervised topic extraction in noisy, Big Data contexts.
We present three approaches we built on the Variational Autoencoder framework.
We show that our models achieve equal to better performance than state-of-the-art methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The age of social media has opened new opportunities for businesses. This
flourishing wealth of information is outside traditional channels and
frameworks of classical marketing research, including that of Marketing Mix
Modeling (MMM). Textual data, in particular, poses many challenges that data
analysis practitioners must tackle. Social media constitute massive,
heterogeneous, and noisy document sources. Industrial data acquisition
processes include some amount of ETL. However, the variability of noise in the
data and the heterogeneity induced by different sources create the need for
ad-hoc tools. Put otherwise, customer insight extraction in fully unsupervised,
noisy contexts is an arduous task. This research addresses the challenge of
fully unsupervised topic extraction in noisy, Big Data contexts. We present
three approaches we built on the Variational Autoencoder framework: the
Embedded Dirichlet Process, the Embedded Hierarchical Dirichlet Process, and
the time-aware Dynamic Embedded Dirichlet Process. These nonparametric
approaches concerning topics present the particularity of determining word
embeddings and topic embeddings. These embeddings do not require transfer
learning, but knowledge transfer remains possible. We test these approaches on
benchmark and automotive industry-related datasets from a real-world use case.
We show that our models achieve equal to better performance than
state-of-the-art methods and that the field of topic modeling would benefit
from improved evaluation metrics.
Related papers
- Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices [91.71951459594074]
Long language models (LLMs) with extended context windows have significantly improved tasks such as information extraction, question answering, and complex planning scenarios.
Existing methods typically utilize the Self-Instruct framework to generate instruction tuning data for better long context capability improvement.
We propose the Multi-agent Interactive Multi-hop Generation framework, incorporating a Quality Verification Agent, a Single-hop Question Generation Agent, a Multiple Question Sampling Strategy, and a Multi-hop Question Merger Agent.
Our findings show that our synthetic high-quality long-context instruction data significantly enhances model performance, even surpassing models trained on larger amounts of human
arXiv Detail & Related papers (2024-09-03T13:30:00Z) - Learning From Crowdsourced Noisy Labels: A Signal Processing Perspective [42.24248330317496]
This feature article introduces advances in learning from noisy crowdsourced labels.
The focus is on key crowdsourcing models and their methodological treatments, from classical statistical models to recent deep learning-based approaches.
In particular, this article reviews the connections between signal processing (SP) theory and methods, such as identifiability of tensor and nonnegative matrix factorization.
arXiv Detail & Related papers (2024-07-09T14:34:40Z) - Adapting Large Language Models for Content Moderation: Pitfalls in Data
Engineering and Supervised Fine-tuning [79.53130089003986]
Large Language Models (LLMs) have become a feasible solution for handling tasks in various domains.
In this paper, we introduce how to fine-tune a LLM model that can be privately deployed for content moderation.
arXiv Detail & Related papers (2023-10-05T09:09:44Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Analytical Engines With Context-Rich Processing: Towards Efficient
Next-Generation Analytics [12.317930859033149]
We envision an analytical engine co-optimized with components that enable context-rich analysis.
We aim for a holistic pipeline cost- and rule-based optimization across relational and model-based operators.
arXiv Detail & Related papers (2022-12-14T21:46:33Z) - Explainable Artificial Intelligence for Improved Modeling of Processes [6.29494485203591]
We evaluate the capability of modern Transformer architectures and more classical Machine Learning technologies of modeling process regularities.
We show that the ML models are capable of predicting critical outcomes and that the attention mechanisms or XAI components offer new insights into the underlying processes.
arXiv Detail & Related papers (2022-12-01T17:56:24Z) - Generating Hidden Markov Models from Process Models Through Nonnegative Tensor Factorization [0.0]
We introduce a novel mathematically sound method that integrates theoretical process models with interrelated minimal Hidden Markov Models.
Our method consolidates: (a) theoretical process models, (b) HMMs, (c) coupled nonnegative matrix-tensor factorizations, and (d) custom model selection.
arXiv Detail & Related papers (2022-10-03T16:19:27Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Principles and Practice of Explainable Machine Learning [12.47276164048813]
This report focuses on data-driven methods -- machine learning (ML) and pattern recognition models in particular.
With the increasing prevalence and complexity of methods, business stakeholders in the very least have a growing number of concerns about the drawbacks of models.
We have undertaken a survey to help industry practitioners understand the field of explainable machine learning better.
arXiv Detail & Related papers (2020-09-18T14:50:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.