Towards Scalable Topic Detection on Web via Simulating Levy Walks Nature of Topics in Similarity Space
- URL: http://arxiv.org/abs/2408.05348v1
- Date: Fri, 26 Jul 2024 07:19:46 GMT
- Title: Towards Scalable Topic Detection on Web via Simulating Levy Walks Nature of Topics in Similarity Space
- Authors: Junbiao Pang, Qingming Huang,
- Abstract summary: We present a novel, yet very powerful Explore-Exploit (EE) approach to group topics by simulating Levy walks nature in the similarity space.
Experiments on two public data sets demonstrate that our approach is not only comparable to the state-of-the-art methods in terms of effectiveness but also significantly outperforms the state-of-the-art methods in terms of efficiency.
- Score: 55.97416108140739
- License:
- Abstract: Organizing a few webpages from social media websites into popular topics is one of the key steps to understand trends on web. Discovering popular topics from web faces a sea of noise webpages which never evolve into popular topics. In this paper, we discover that the similarity values between webpages in a popular topic contain the statistically similar features observed in Levy walks. Consequently, we present a simple, novel, yet very powerful Explore-Exploit (EE) approach to group topics by simulating Levy walks nature in the similarity space. The proposed EE-based topic clustering is an effective and effcient method which is a solid move towards handling a sea of noise webpages. Experiments on two public data sets demonstrate that our approach is not only comparable to the state-of-the-art methods in terms of effectiveness but also significantly outperforms the state-of-the-art methods in terms of efficiency.
Related papers
- Bundle Fragments into a Whole: Mining More Complete Clusters via Submodular Selection of Interesting webpages for Web Topic Detection [49.8035161337388]
A state-of-the-art solution is firstly to organize webpages into a large volume of multi-granularity topic candidates.
Hot topics are further identified by estimating their interestingness.
This paper proposes a bundling-refining approach to mine more complete hot topics from fragments.
arXiv Detail & Related papers (2024-09-19T00:46:31Z) - Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL)
GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval.
Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z) - Discovering Latent Themes in Social Media Messaging: A Machine-in-the-Loop Approach Integrating LLMs [22.976609127865732]
We introduce a novel approach to uncovering latent themes in social media messaging.
Our work sheds light on the dynamic nature of social media, revealing the shifts in the thematic focus of messaging in response to real-world events.
arXiv Detail & Related papers (2024-03-15T21:54:00Z) - Aligning and Prompting Everything All at Once for Universal Visual
Perception [79.96124061108728]
APE is a universal visual perception model for aligning and prompting everything all at once in an image to perform diverse tasks.
APE advances the convergence of detection and grounding by reformulating language-guided grounding as open-vocabulary detection.
Experiments on over 160 datasets demonstrate that APE outperforms state-of-the-art models.
arXiv Detail & Related papers (2023-12-04T18:59:50Z) - Semantic Role Aware Correlation Transformer for Text to Video Retrieval [23.183653281610866]
This paper proposes a novel transformer that explicitly disentangles the text and video into semantic roles of objects, spatial contexts and temporal contexts.
Preliminary results on popular YouCook2 indicate that our approach surpasses a current state-of-the-art method, with a high margin in all metrics.
arXiv Detail & Related papers (2022-06-26T11:28:03Z) - Twitter Referral Behaviours on News Consumption with Ensemble Clustering
of Click-Stream Data in Turkish Media [2.9005223064604078]
This study investigates the readers' click activities in the organizations' websites to identify news consumption patterns following referrals from Twitter.
The investigation is widened to a broad perspective by linking the log data with news content to enrich the insights.
arXiv Detail & Related papers (2022-02-04T09:57:13Z) - DIRV: Dense Interaction Region Voting for End-to-End Human-Object
Interaction Detection [53.40028068801092]
We propose a novel one-stage HOI detection approach based on a new concept called interaction region for the HOI problem.
Unlike previous methods, our approach concentrates on the densely sampled interaction regions across different scales for each human-object pair.
In order to compensate for the detection flaws of a single interaction region, we introduce a novel voting strategy.
arXiv Detail & Related papers (2020-10-02T13:57:58Z) - Stance Detection in Web and Social Media: A Comparative Study [3.937145867005019]
Online forums and social media platforms are increasingly being used to discuss topics of varying polarities where different people take different stances.
Several methodologies for automatic stance detection from text have been proposed in literature.
To our knowledge, there has not been any systematic investigation towards their, and their comparative performances.
arXiv Detail & Related papers (2020-07-12T12:39:35Z) - Human Trajectory Forecasting in Crowds: A Deep Learning Perspective [89.4600982169]
We present an in-depth analysis of existing deep learning-based methods for modelling social interactions.
We propose two knowledge-based data-driven methods to effectively capture these social interactions.
We develop a large scale interaction-centric benchmark TrajNet++, a significant yet missing component in the field of human trajectory forecasting.
arXiv Detail & Related papers (2020-07-07T17:19:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.