Related papers: Importance-aware Topic Modeling for Discovering Public Transit Risk from Noisy Social Media

Importance-aware Topic Modeling for Discovering Public Transit Risk from Noisy Social Media

URL: http://arxiv.org/abs/2512.06293v1
Date: Sat, 06 Dec 2025 04:45:17 GMT
Title: Importance-aware Topic Modeling for Discovering Public Transit Risk from Noisy Social Media
Authors: Fatima Ashraf, Muhammad Ayub Sabir, Jiaxin Deng, Junbiao Pang, Haitao Yu,
Abstract summary: Urban transit agencies increasingly turn to social media to monitor emerging service risks such as crowding, delays, and safety incidents.<n>We address this challenge by jointly modeling linguistic interactions and user influence.<n>The proposed model achieves state-of-the-art topic coherence and strong diversity compared with leading baselines.
Score: 8.638879065913246
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Urban transit agencies increasingly turn to social media to monitor emerging service risks such as crowding, delays, and safety incidents, yet the signals of concern are sparse, short, and easily drowned by routine chatter. We address this challenge by jointly modeling linguistic interactions and user influence. First, we construct an influence-weighted keyword co-occurrence graph from cleaned posts so that socially impactful posts contributes proportionally to the underlying evidence. The core of our framework is a Poisson Deconvolution Factorization (PDF) that decomposes this graph into a low-rank topical structure and topic-localized residual interactions, producing an interpretable topic--keyword basis together with topic importance scores. A decorrelation regularizer \emph{promotes} distinct topics, and a lightweight optimization procedure ensures stable convergence under nonnegativity and normalization constraints. Finally, the number of topics is selected through a coherence-driven sweep that evaluates the quality and distinctness of the learned topics. On large-scale social streams, the proposed model achieves state-of-the-art topic coherence and strong diversity compared with leading baselines. The code and dataset are publicly available at https://github.com/pangjunbiao/Topic-Modeling_ITS.git

Related papers

TopiCLEAR: Topic extraction by CLustering Embeddings with Adaptive dimensional Reduction [0.0]
We present TopiCLEAR: Topic extraction by CLustering Embeddings with Adaptive dimensional Reduction.<n>We evaluate our approach on four diverse datasets, 20News, AgNewsTitle, Reddit, and TweetTopic.<n>Our method produces more interpretable topics, highlighting its potential for applications in social media data and web content analytics.
arXiv Detail & Related papers (2025-12-07T07:01:28Z)
QoSDiff: An Implicit Topological Embedding Learning Framework Leveraging Denoising Diffusion and Adversarial Attention for Robust QoS Prediction [5.632045399777709]
This paper introduces emphQoSDiff, a novel embedding learning framework that bypasses the prerequisite of explicit graph construction.<n>To address these challenges, this paper introduces emphQoSDiff, a novel embedding learning framework that bypasses the prerequisite of explicit graph construction.
arXiv Detail & Related papers (2025-12-04T09:17:26Z)
Latent Topic Synthesis: Leveraging LLMs for Electoral Ad Analysis [51.95395936342771]
We introduce an end-to-end framework for automatically generating an interpretable topic taxonomy from an unlabeled corpus.<n>We apply this framework to a large corpus of Meta political ads from the month ahead of the 2024 U.S. Presidential election.<n>Our approach uncovers latent discourse structures, synthesizes semantically rich topic labels, and annotates topics with moral framing dimensions.
arXiv Detail & Related papers (2025-10-16T20:30:20Z)
LLM-Assisted Topic Reduction for BERTopic on Social Media Data [0.22940141855172028]
We propose a framework that combines BERTopic for topic generation with large language models for topic reduction.<n>We evaluate the approach across three Twitter/X datasets and four different language models.
arXiv Detail & Related papers (2025-09-18T20:59:11Z)
Enhancing the Robustness of Contextual ASR to Varying Biasing Information Volumes Through Purified Semantic Correlation Joint Modeling [63.755562174967274]
Cross-attention is affected by variations in biasing information volume.<n>We propose a purified semantic correlation joint modeling (PSC-Joint) approach.<n> PSC-Joint achieves average relative F1 score improvements of up to 21.34% on AISHELL-1 and 28.46% on KeSpeech.
arXiv Detail & Related papers (2025-09-07T03:46:59Z)
Cluster-Aware Attacks on Graph Watermarks [50.19105800063768]
We introduce a cluster-aware threat model in which adversaries apply community-guided modifications to evade detection.<n>Our results show that cluster-aware attacks can reduce attribution accuracy by up to 80% more than random baselines.<n>We propose a lightweight embedding enhancement that distributes watermark nodes across graph communities.
arXiv Detail & Related papers (2025-04-24T22:49:28Z)
A social context-aware graph-based multimodal attentive learning framework for disaster content classification during emergencies: a benchmark dataset and method [4.757418935621701]
CrisisSpot is a method that captures complex relationships between textual and visual modalities.<n>IDEA captures both harmonious and contrasting patterns within the data to enhance multimodal interactions.<n>CrisisSpot achieved an average F1-score gain of 9.45% and 5.01% compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-10-11T13:51:46Z)
Seeing the Unseen: Learning Basis Confounder Representations for Robust Traffic Prediction [41.59726314922999]
Traffic prediction is essential for intelligent transportation systems and urban computing.<n>It aims to establish a relationship between historical traffic data X and future traffic states Y by employing various statistical or deep learning methods.<n>The relations of X -> Y are often influenced by external confounders that simultaneously affect both X and Y.<n>Existing deep-learning traffic prediction models adopt the classic front-door and back-door adjustments to address the confounder issue.
arXiv Detail & Related papers (2023-11-21T09:33:13Z)
Recurrent Coupled Topic Modeling over Sequential Documents [33.35324412209806]
We show that a current topic evolves from all prior topics with corresponding coupling weights, forming the multi-topic-thread evolution. A new solution with a set of novel data augmentation techniques is proposed, which successfully discomposes the multi-couplings between evolving topics. A novel Gibbs sampler with a backward-forward filter algorithm efficiently learns latent timeevolving parameters in a closed-form.
arXiv Detail & Related papers (2021-06-23T08:58:13Z)
Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document. We also simultaneously cluster users, removing the need for post-hoc cluster estimation. Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z)
ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads. We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z)
PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation [58.98802062945709]
We propose a novel Predicate-Correlation Perception Learning scheme to adaptively seek out appropriate loss weights. Our PCPL framework is further equipped with a graph encoder module to better extract context features.
arXiv Detail & Related papers (2020-09-02T08:30:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.