Related papers: ComStreamClust: a communicative multi-agent approach to text clustering in streaming data

ComStreamClust: a communicative multi-agent approach to text clustering in streaming data

URL: http://arxiv.org/abs/2010.05349v2
Date: Tue, 27 Apr 2021 16:58:49 GMT
Title: ComStreamClust: a communicative multi-agent approach to text clustering in streaming data
Authors: Ali Najafi, Araz Gholipour-Shilabin, Rahim Dehkharghani, Ali Mohammadpur-Fard, Meysam Asgari-Chenaghlu
Abstract summary: We propose a novel, multi-agent, communicative clustering approach, so-called ComStreamClust for clustering sub-topics inside a broader topic. The proposed approach is parallelizable, and can simultaneously handle several data-point. ComStreamClust has been evaluated on two datasets: the COVID-19 and the FA CUP.
Score: 1.9949261242626626
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Topic detection is the task of determining and tracking hot topics in social media. Twitter is arguably the most popular platform for people to share their ideas with others about different issues. One such prevalent issue is the COVID-19 pandemic. Detecting and tracking topics on these kinds of issues would help governments and healthcare companies deal with this phenomenon. In this paper, we propose a novel, multi-agent, communicative clustering approach, so-called ComStreamClust for clustering sub-topics inside a broader topic, e.g., COVID-19. The proposed approach is parallelizable, and can simultaneously handle several data-point. The LaBSE sentence embedding is used to measure the semantic similarity between two tweets. ComStreamClust has been evaluated on two datasets: the COVID-19 and the FA CUP. The results obtained from ComStreamClust approve the effectiveness of the proposed approach when compared to existing methods.

Related papers

An Enhanced Model-based Approach for Short Text Clustering [58.60681789677676]
Short text clustering has become increasingly important with the popularity of social media like Twitter, Google+, and Facebook.<n>Existing methods can be broadly categorized into two paradigms: topic model-based approaches and deep representation learning-based approaches.<n>We propose a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model (GSDMM), which effectively handles the sparsity and high dimensionality of short texts.<n>Based on several aspects of GSDMM that warrant further refinement, we propose an improved approach, GSDMM+, designed to further optimize its performance.
arXiv Detail & Related papers (2025-07-18T10:07:42Z)
Towards Scalable Topic Detection on Web via Simulating Levy Walks Nature of Topics in Similarity Space [55.97416108140739]
We present a novel, yet very powerful Explore-Exploit (EE) approach to group topics by simulating Levy walks nature in the similarity space. Experiments on two public data sets demonstrate that our approach is not only comparable to the state-of-the-art methods in terms of effectiveness but also significantly outperforms the state-of-the-art methods in terms of efficiency.
arXiv Detail & Related papers (2024-07-26T07:19:46Z)
Going beyond research datasets: Novel intent discovery in the industry setting [60.90117614762879]
This paper proposes methods to improve the intent discovery pipeline deployed in a large e-commerce platform. We show the benefit of pre-training language models on in-domain data: both self-supervised and with weak supervision. We also devise the best method to utilize the conversational structure (i.e., question and answer) of real-life datasets during fine-tuning for clustering tasks, which we call Conv.
arXiv Detail & Related papers (2023-05-09T14:21:29Z)
Improved Topic modeling in Twitter through Community Pooling [0.0]
Twitter posts are short and often less coherent than other text documents. We propose a new pooling scheme for topic modeling in Twitter, which groups tweets whose authors belong to the same community. Results show that our Community polling method outperformed other methods on the majority of metrics in two heterogeneous datasets.
arXiv Detail & Related papers (2021-12-20T17:05:32Z)
Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program. We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles. We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z)
MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos. We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z)
Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document. We also simultaneously cluster users, removing the need for post-hoc cluster estimation. Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z)
A General Method to Find Highly Coordinating Communities in Social Media through Inferred Interaction Links [13.264683014487376]
Political misinformation, astroturfing and organised trolling are online malicious behaviours with significant real-world effects. We propose a novel temporal window approach that relies on account interactions and metadata alone. It detects groups of accounts engaging in various behaviours that, in concert, come to execute different goal-based strategies.
arXiv Detail & Related papers (2021-03-05T00:48:23Z)
Who will accept my request? Predicting response of link initiation in two-way relation networks [7.547803601922528]
This paper addresses an important problem in social networks analysis and mining that is how to predict link initiation feedback in two-way networks. Relationships between two individuals in a two-way network include a link invitation from one of the individuals, which will be an established link if accepted by the invitee. We propose a methodology to solve the link initiation feedback prediction problem in this multilayer fashion.
arXiv Detail & Related papers (2020-12-21T08:14:37Z)
The Influence of Domain-Based Preprocessing on Subject-Specific Clustering [55.41644538483948]
The sudden change of moving the majority of teaching online at Universities has caused an increased amount of workload for academics. One way to deal with this problem is to cluster these questions depending on their topic. In this paper, we explore the realms of tagging data sets, focusing on identifying code excerpts and providing empirical results.
arXiv Detail & Related papers (2020-11-16T17:47:19Z)
Covid-Transformer: Detecting COVID-19 Trending Topics on Twitter Using Universal Sentence Encoder [7.305019142196582]
corona-virus disease (also known as COVID-19) has led to a pandemic, impacting more than 200 countries across the globe. With its global impact, COVID-19 has become a major concern of people almost everywhere. We try to analyze the tweets and detect the trending topics and major concerns of people on Twitter.
arXiv Detail & Related papers (2020-09-08T19:00:38Z)
Topic Detection from Conversational Dialogue Corpus with Parallel Dirichlet Allocation Model and Elbow Method [1.599072005190786]
We propose a topic detection approach with Parallel Latent Dirichlet Allocation (PLDA) Model. We use K-mean clustering with Elbow Method for interpretation and validation of consistency within-cluster analysis. The experimental results show that combining PLDA with Elbow method selects the optimal number of clusters and refines the topics for the conversation.
arXiv Detail & Related papers (2020-06-05T10:24:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.