TopicBERT: A Transformer transfer learning based memory-graph approach
for multimodal streaming social media topic detection
- URL: http://arxiv.org/abs/2008.06877v1
- Date: Sun, 16 Aug 2020 10:39:50 GMT
- Title: TopicBERT: A Transformer transfer learning based memory-graph approach
for multimodal streaming social media topic detection
- Authors: Meysam Asgari-Chenaghlu, Mohammad-Reza Feizi-Derakhshi, Leili
farzinvash, Mohammad-Ali Balafar, Cina Motamed
- Abstract summary: Social networks with bursty short messages and their respective large data scale spread among vast variety of topics are research interest of many researchers.
These properties of social networks which are known as 5'Vs of big data has led to many unique and enlightenment algorithms and techniques applied to large social networking datasets and data streams.
- Score: 8.338441212378587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real time nature of social networks with bursty short messages and their
respective large data scale spread among vast variety of topics are research
interest of many researchers. These properties of social networks which are
known as 5'Vs of big data has led to many unique and enlightenment algorithms
and techniques applied to large social networking datasets and data streams.
Many of these researches are based on detection and tracking of hot topics and
trending social media events that help revealing many unanswered questions.
These algorithms and in some cases software products mostly rely on the nature
of the language itself. Although, other techniques such as unsupervised data
mining methods are language independent but many requirements for a
comprehensive solution are not met. Many research issues such as noisy
sentences that adverse grammar and new online user invented words are
challenging maintenance of a good social network topic detection and tracking
methodology; The semantic relationship between words and in most cases,
synonyms are also ignored by many of these researches. In this research, we use
Transformers combined with an incremental community detection algorithm.
Transformer in one hand, provides the semantic relation between words in
different contexts. On the other hand, the proposed graph mining technique
enhances the resulting topics with aid of simple structural rules. Named entity
recognition from multimodal data, image and text, labels the named entities
with entity type and the extracted topics are tuned using them. All operations
of proposed system has been applied with big social data perspective under
NoSQL technologies. In order to present a working and systematic solution, we
combined MongoDB with Neo4j as two major database systems of our work. The
proposed system shows higher precision and recall compared to other methods in
three different datasets.
Related papers
- Utilizing Social Media Attributes for Enhanced Keyword Detection: An
IDF-LDA Model Applied to Sina Weibo [0.0]
We propose a novel method to address the keyword detection problem in social media.
Our model combines the Inverse Document Frequency (IDF) and Latent Dirichlet Allocation (LDA) models to better cope with the distinct attributes of social media data.
arXiv Detail & Related papers (2023-05-30T08:35:39Z) - Vision+X: A Survey on Multimodal Learning in the Light of Data [64.03266872103835]
multimodal machine learning that incorporates data from various sources has become an increasingly popular research area.
We analyze the commonness and uniqueness of each data format mainly ranging from vision, audio, text, and motions.
We investigate the existing literature on multimodal learning from both the representation learning and downstream application levels.
arXiv Detail & Related papers (2022-10-05T13:14:57Z) - Panning for gold: Lessons learned from the platform-agnostic automated
detection of political content in textual data [48.7576911714538]
We discuss how these techniques can be used to detect political content across different platforms.
We compare the performance of three groups of detection techniques relying on dictionaries, supervised machine learning, or neural networks.
Our results show the limited impact of preprocessing on model performance, with the best results for less noisy data being achieved by neural network- and machine-learning-based models.
arXiv Detail & Related papers (2022-07-01T15:23:23Z) - AtteSTNet -- An attention and subword tokenization based approach for
code-switched text hate speech detection [1.3190581566723918]
Language used in social media is often a combination of English and the native language in the region.
In India, Hindi is used predominantly and is often code-switched with English, giving rise to the Hinglish (Hindi+English) language.
arXiv Detail & Related papers (2021-12-10T20:01:44Z) - On-the-Fly Rectification for Robust Large-Vocabulary Topic Inference [14.664456948527292]
As object vocabularies grow, it becomes more expensive to store and run inference algorithms on co-occurrence statistics.
We propose novel methods that simultaneously compress and rectify co-occurrence statistics, scaling gracefully with the size of vocabulary and the dimension of latent space.
We also present new algorithms learning latent variables from the compressed statistics, and verify that our methods perform comparably to previous approaches on both textual and non-textual data.
arXiv Detail & Related papers (2021-11-12T06:44:04Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Semantic maps and metrics for science Semantic maps and metrics for
science using deep transformer encoders [1.599072005190786]
Recent advances in natural language understanding driven by deep transformer networks offer new possibilities for mapping science.
Transformer embedding models capture shades of association and connotation that vary across different linguistic contexts.
We report a procedure for encoding scientific documents with these tools, measuring their improvement over static word embeddings.
arXiv Detail & Related papers (2021-04-13T04:12:20Z) - Be More with Less: Hypergraph Attention Networks for Inductive Text
Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task.
Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words.
We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z) - Pretrained Transformers for Text Ranking: BERT and Beyond [53.83210899683987]
This survey provides an overview of text ranking with neural network architectures known as transformers.
The combination of transformers and self-supervised pretraining has been responsible for a paradigm shift in natural language processing.
arXiv Detail & Related papers (2020-10-13T15:20:32Z) - Sequential Sentence Matching Network for Multi-turn Response Selection
in Retrieval-based Chatbots [45.920841134523286]
We propose a matching network, called sequential sentence matching network (S2M), to use the sentence-level semantic information to address the problem.
Firstly, we find that by using the sentence-level semantic information, the network successfully addresses the problem and gets a significant improvement on matching, resulting in a state-of-the-art performance.
arXiv Detail & Related papers (2020-05-16T09:47:19Z) - Distributed Learning in the Non-Convex World: From Batch to Streaming
Data, and Beyond [73.03743482037378]
Distributed learning has become a critical direction of the massively connected world envisioned by many.
This article discusses four key elements of scalable distributed processing and real-time data computation problems.
Practical issues and future research will also be discussed.
arXiv Detail & Related papers (2020-01-14T14:11:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.