Transit Pulse: Utilizing Social Media as a Source for Customer Feedback and Information Extraction with Large Language Model
- URL: http://arxiv.org/abs/2410.15016v1
- Date: Sat, 19 Oct 2024 07:08:40 GMT
- Title: Transit Pulse: Utilizing Social Media as a Source for Customer Feedback and Information Extraction with Large Language Model
- Authors: Jiahao Wang, Amer Shalaby,
- Abstract summary: We propose a novel approach to extracting and analyzing transit-related information.
Our method employs Large Language Models (LLM), specifically Llama 3, for a streamlined analysis.
Our results demonstrate the potential of LLMs to transform social media data analysis in the public transit domain.
- Score: 12.6020349733674
- License:
- Abstract: Users of the transit system flood social networks daily with messages that contain valuable insights crucial for improving service quality. These posts help transit agencies quickly identify emerging issues. Parsing topics and sentiments is key to gaining comprehensive insights to foster service excellence. However, the volume of messages makes manual analysis impractical, and standard NLP techniques like Term Frequency-Inverse Document Frequency (TF-IDF) fall short in nuanced interpretation. Traditional sentiment analysis separates topics and sentiments before integrating them, often missing the interaction between them. This incremental approach complicates classification and reduces analytical productivity. To address these challenges, we propose a novel approach to extracting and analyzing transit-related information, including sentiment and sarcasm detection, identification of unusual system problems, and location data from social media. Our method employs Large Language Models (LLM), specifically Llama 3, for a streamlined analysis free from pre-established topic labels. To enhance the model's domain-specific knowledge, we utilize Retrieval-Augmented Generation (RAG), integrating external knowledge sources into the information extraction pipeline. We validated our method through extensive experiments comparing its performance with traditional NLP approaches on user tweet data from the real world transit system. Our results demonstrate the potential of LLMs to transform social media data analysis in the public transit domain, providing actionable insights and enhancing transit agencies' responsiveness by extracting a broader range of information.
Related papers
- CityGPT: Towards Urban IoT Learning, Analysis and Interaction with Multi-Agent System [4.612237040042468]
CityGPT employs three agents to accomplish thetemporal analysis of IoT data.
We have agnentized the framework, facilitated by a large language model (LLM), to increase the data comprehensibility.
Our evaluation results on real-world data with different time show that the CityGPT framework can guarantee robust performance in computing.
arXiv Detail & Related papers (2024-05-23T15:27:18Z) - Time Series Analysis of Key Societal Events as Reflected in Complex
Social Media Data Streams [0.9790236766474201]
This study investigates narrative evolution on a niche social media platform GAB and an established messaging service Telegram.
Our approach is a novel mode to study multiple social media domains to distil key information which may be obscured otherwise.
The main findings are: (1) the time line can be deconstructed to provide useful data features allowing for improved interpretation; (2) a methodology is applied which provides a basis for generalization.
arXiv Detail & Related papers (2024-03-11T18:33:56Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Prompt-and-Align: Prompt-Based Social Alignment for Few-Shot Fake News
Detection [50.07850264495737]
"Prompt-and-Align" (P&A) is a novel prompt-based paradigm for few-shot fake news detection.
We show that P&A sets new states-of-the-art for few-shot fake news detection performance by significant margins.
arXiv Detail & Related papers (2023-09-28T13:19:43Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - MetRoBERTa: Leveraging Traditional Customer Relationship Management Data
to Develop a Transit-Topic-Aware Language Model [3.3421154214189284]
We propose a transit-topic-aware large language model (LLM) capable of classifying open-ended text feedback to relevant transit-specific topics.
First, we utilize semi-supervised learning to engineer a training dataset of 11 broad transit topics detected in a corpus of 6 years of customer feedback.
We then use this dataset to train and thoroughly evaluate a language model based on the RoBERTa architecture.
arXiv Detail & Related papers (2023-08-09T15:11:37Z) - ManiTweet: A New Benchmark for Identifying Manipulation of News on Social Media [74.93847489218008]
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
To study this task, we have proposed a data collection schema and curated a dataset called ManiTweet, consisting of 3.6K pairs of tweets and corresponding articles.
Our analysis demonstrates that this task is highly challenging, with large language models (LLMs) yielding unsatisfactory performance.
arXiv Detail & Related papers (2023-05-23T16:40:07Z) - Twitter Referral Behaviours on News Consumption with Ensemble Clustering
of Click-Stream Data in Turkish Media [2.9005223064604078]
This study investigates the readers' click activities in the organizations' websites to identify news consumption patterns following referrals from Twitter.
The investigation is widened to a broad perspective by linking the log data with news content to enrich the insights.
arXiv Detail & Related papers (2022-02-04T09:57:13Z) - Knowledge Graph Augmented Network Towards Multiview Representation
Learning for Aspect-based Sentiment Analysis [96.53859361560505]
We propose a knowledge graph augmented network (KGAN) to incorporate external knowledge with explicitly syntactic and contextual information.
KGAN captures the sentiment feature representations from multiple perspectives, i.e., context-, syntax- and knowledge-based.
Experiments on three popular ABSA benchmarks demonstrate the effectiveness and robustness of our KGAN.
arXiv Detail & Related papers (2022-01-13T08:25:53Z) - Cognitive Computing to Optimize IT Services [0.0]
A Cognitive solution goes beyond the traditional structured data analysis by deep analyses of both structured and unstructured text.
In experiments, upto 18-25% of yearly ticket volume has been reduced using the proposed approach.
arXiv Detail & Related papers (2021-12-28T09:56:44Z) - Mining Implicit Relevance Feedback from User Behavior for Web Question
Answering [92.45607094299181]
We make the first study to explore the correlation between user behavior and passage relevance.
Our approach significantly improves the accuracy of passage ranking without extra human labeled data.
In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine.
arXiv Detail & Related papers (2020-06-13T07:02:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.