ComStreamClust: a communicative multi-agent approach to text clustering
in streaming data
- URL: http://arxiv.org/abs/2010.05349v2
- Date: Tue, 27 Apr 2021 16:58:49 GMT
- Title: ComStreamClust: a communicative multi-agent approach to text clustering
in streaming data
- Authors: Ali Najafi, Araz Gholipour-Shilabin, Rahim Dehkharghani, Ali
Mohammadpur-Fard, Meysam Asgari-Chenaghlu
- Abstract summary: We propose a novel, multi-agent, communicative clustering approach, so-called ComStreamClust for clustering sub-topics inside a broader topic.
The proposed approach is parallelizable, and can simultaneously handle several data-point.
ComStreamClust has been evaluated on two datasets: the COVID-19 and the FA CUP.
- Score: 1.9949261242626626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Topic detection is the task of determining and tracking hot topics in social
media. Twitter is arguably the most popular platform for people to share their
ideas with others about different issues. One such prevalent issue is the
COVID-19 pandemic. Detecting and tracking topics on these kinds of issues would
help governments and healthcare companies deal with this phenomenon. In this
paper, we propose a novel, multi-agent, communicative clustering approach,
so-called ComStreamClust for clustering sub-topics inside a broader topic,
e.g., COVID-19. The proposed approach is parallelizable, and can simultaneously
handle several data-point. The LaBSE sentence embedding is used to measure the
semantic similarity between two tweets. ComStreamClust has been evaluated on
two datasets: the COVID-19 and the FA CUP. The results obtained from
ComStreamClust approve the effectiveness of the proposed approach when compared
to existing methods.
Related papers
- Towards Scalable Topic Detection on Web via Simulating Levy Walks Nature of Topics in Similarity Space [55.97416108140739]
We present a novel, yet very powerful Explore-Exploit (EE) approach to group topics by simulating Levy walks nature in the similarity space.
Experiments on two public data sets demonstrate that our approach is not only comparable to the state-of-the-art methods in terms of effectiveness but also significantly outperforms the state-of-the-art methods in terms of efficiency.
arXiv Detail & Related papers (2024-07-26T07:19:46Z) - Going beyond research datasets: Novel intent discovery in the industry
setting [60.90117614762879]
This paper proposes methods to improve the intent discovery pipeline deployed in a large e-commerce platform.
We show the benefit of pre-training language models on in-domain data: both self-supervised and with weak supervision.
We also devise the best method to utilize the conversational structure (i.e., question and answer) of real-life datasets during fine-tuning for clustering tasks, which we call Conv.
arXiv Detail & Related papers (2023-05-09T14:21:29Z) - Improved Topic modeling in Twitter through Community Pooling [0.0]
Twitter posts are short and often less coherent than other text documents.
We propose a new pooling scheme for topic modeling in Twitter, which groups tweets whose authors belong to the same community.
Results show that our Community polling method outperformed other methods on the majority of metrics in two heterogeneous datasets.
arXiv Detail & Related papers (2021-12-20T17:05:32Z) - Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal
Misinformation [83.2079454464572]
This paper describes our approach to the Image-Text Inconsistency Detection challenge of the DARPA Semantic Forensics (SemaFor) Program.
We collect Twitter-COMMs, a large-scale multimodal dataset with 884k tweets relevant to the topics of Climate Change, COVID-19, and Military Vehicles.
We train our approach, based on the state-of-the-art CLIP model, leveraging automatically generated random and hard negatives.
arXiv Detail & Related papers (2021-12-16T03:37:20Z) - MD-CSDNetwork: Multi-Domain Cross Stitched Network for Deepfake
Detection [80.83725644958633]
Current deepfake generation methods leave discriminative artifacts in the frequency spectrum of fake images and videos.
We present a novel approach, termed as MD-CSDNetwork, for combining the features in the spatial and frequency domains to mine a shared discriminative representation.
arXiv Detail & Related papers (2021-09-15T14:11:53Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - A General Method to Find Highly Coordinating Communities in Social Media
through Inferred Interaction Links [13.264683014487376]
Political misinformation, astroturfing and organised trolling are online malicious behaviours with significant real-world effects.
We propose a novel temporal window approach that relies on account interactions and metadata alone.
It detects groups of accounts engaging in various behaviours that, in concert, come to execute different goal-based strategies.
arXiv Detail & Related papers (2021-03-05T00:48:23Z) - Who will accept my request? Predicting response of link initiation in
two-way relation networks [7.547803601922528]
This paper addresses an important problem in social networks analysis and mining that is how to predict link initiation feedback in two-way networks.
Relationships between two individuals in a two-way network include a link invitation from one of the individuals, which will be an established link if accepted by the invitee.
We propose a methodology to solve the link initiation feedback prediction problem in this multilayer fashion.
arXiv Detail & Related papers (2020-12-21T08:14:37Z) - The Influence of Domain-Based Preprocessing on Subject-Specific
Clustering [55.41644538483948]
The sudden change of moving the majority of teaching online at Universities has caused an increased amount of workload for academics.
One way to deal with this problem is to cluster these questions depending on their topic.
In this paper, we explore the realms of tagging data sets, focusing on identifying code excerpts and providing empirical results.
arXiv Detail & Related papers (2020-11-16T17:47:19Z) - Covid-Transformer: Detecting COVID-19 Trending Topics on Twitter Using
Universal Sentence Encoder [7.305019142196582]
corona-virus disease (also known as COVID-19) has led to a pandemic, impacting more than 200 countries across the globe.
With its global impact, COVID-19 has become a major concern of people almost everywhere.
We try to analyze the tweets and detect the trending topics and major concerns of people on Twitter.
arXiv Detail & Related papers (2020-09-08T19:00:38Z) - Topic Detection from Conversational Dialogue Corpus with Parallel
Dirichlet Allocation Model and Elbow Method [1.599072005190786]
We propose a topic detection approach with Parallel Latent Dirichlet Allocation (PLDA) Model.
We use K-mean clustering with Elbow Method for interpretation and validation of consistency within-cluster analysis.
The experimental results show that combining PLDA with Elbow method selects the optimal number of clusters and refines the topics for the conversation.
arXiv Detail & Related papers (2020-06-05T10:24:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.