A Graph Convolutional Topic Model for Short and Noisy Text Streams
- URL: http://arxiv.org/abs/2003.06112v4
- Date: Fri, 24 Dec 2021 02:26:38 GMT
- Title: A Graph Convolutional Topic Model for Short and Noisy Text Streams
- Authors: Ngo Van Linh, Tran Xuan Bach and Khoat Than
- Abstract summary: We propose a novel graph convolutional topic model (GCTM) which integrates graph convolutional networks (GCN) into a topic model.
We conduct extensive experiments to evaluate our method with both a human knowledge graph (Wordnet) and a graph built from pre-trained word embeddings (Word2vec)
Our method achieves significantly better performances than state-of-the-art baselines in terms of probabilistic predictive measure and topic coherence.
- Score: 2.578242050187029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning hidden topics from data streams has become absolutely necessary but
posed challenging problems such as concept drift as well as short and noisy
data. Using prior knowledge to enrich a topic model is one of potential
solutions to cope with these challenges. Prior knowledge that is derived from
human knowledge (e.g. Wordnet) or a pre-trained model (e.g. Word2vec) is very
valuable and useful to help topic models work better. However, in a streaming
environment where data arrives continually and infinitely, existing studies are
limited to exploiting these resources effectively. Especially, a knowledge
graph, that contains meaningful word relations, is ignored. In this paper, to
aim at exploiting a knowledge graph effectively, we propose a novel graph
convolutional topic model (GCTM) which integrates graph convolutional networks
(GCN) into a topic model and a learning method which learns the networks and
the topic model simultaneously for data streams. In each minibatch, our method
not only can exploit an external knowledge graph but also can balance the
external and old knowledge to perform well on new data. We conduct extensive
experiments to evaluate our method with both a human knowledge graph (Wordnet)
and a graph built from pre-trained word embeddings (Word2vec). The experimental
results show that our method achieves significantly better performances than
state-of-the-art baselines in terms of probabilistic predictive measure and
topic coherence. In particular, our method can work well when dealing with
short texts as well as concept drift. The implementation of GCTM is available
at \url{https://github.com/bachtranxuan/GCTM.git}.
Related papers
- Node Level Graph Autoencoder: Unified Pretraining for Textual Graph Learning [45.70767623846523]
We propose a novel unified unsupervised learning autoencoder framework, named Node Level Graph AutoEncoder (NodeGAE)
We employ language models as the backbone of the autoencoder, with pretraining on text reconstruction.
Our method maintains simplicity in the training process and demonstrates generalizability across diverse textual graphs and downstream tasks.
arXiv Detail & Related papers (2024-08-09T14:57:53Z) - SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning.
We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task.
We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z) - ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings [20.25180279903009]
We propose Contrastive Graph-Text pretraining (ConGraT) for jointly learning separate representations of texts and nodes in a text-attributed graph (TAG)
Our method trains a language model (LM) and a graph neural network (GNN) to align their representations in a common latent space using a batch-wise contrastive learning objective inspired by CLIP.
Experiments demonstrate that ConGraT outperforms baselines on various downstream tasks, including node and text category classification, link prediction, and language modeling.
arXiv Detail & Related papers (2023-05-23T17:53:30Z) - PRODIGY: Enabling In-context Learning Over Graphs [112.19056551153454]
In-context learning is the ability of a pretrained model to adapt to novel and diverse downstream tasks.
We develop PRODIGY, the first pretraining framework that enables in-context learning over graphs.
arXiv Detail & Related papers (2023-05-21T23:16:30Z) - Learnable Graph Matching: A Practical Paradigm for Data Association [74.28753343714858]
We propose a general learnable graph matching method to address these issues.
Our method achieves state-of-the-art performance on several MOT datasets.
For image matching, our method outperforms state-of-the-art methods on a popular indoor dataset, ScanNet.
arXiv Detail & Related papers (2023-03-27T17:39:00Z) - State of the Art and Potentialities of Graph-level Learning [54.68482109186052]
Graph-level learning has been applied to many tasks including comparison, regression, classification, and more.
Traditional approaches to learning a set of graphs rely on hand-crafted features, such as substructures.
Deep learning has helped graph-level learning adapt to the growing scale of graphs by extracting features automatically and encoding graphs into low-dimensional representations.
arXiv Detail & Related papers (2023-01-14T09:15:49Z) - Neural Graph Matching for Pre-training Graph Neural Networks [72.32801428070749]
Graph neural networks (GNNs) have been shown powerful capacity at modeling structural data.
We present a novel Graph Matching based GNN Pre-Training framework, called GMPT.
The proposed method can be applied to fully self-supervised pre-training and coarse-grained supervised pre-training.
arXiv Detail & Related papers (2022-03-03T09:53:53Z) - Learnable Graph Matching: Incorporating Graph Partitioning with Deep
Feature Learning for Multiple Object Tracking [58.30147362745852]
Data association across frames is at the core of Multiple Object Tracking (MOT) task.
Existing methods mostly ignore the context information among tracklets and intra-frame detections.
We propose a novel learnable graph matching method to address these issues.
arXiv Detail & Related papers (2021-03-30T08:58:45Z) - Co-embedding of Nodes and Edges with Graph Neural Networks [13.020745622327894]
Graph embedding is a way to transform and encode the data structure in high dimensional and non-Euclidean feature space.
CensNet is a general graph embedding framework, which embeds both nodes and edges to a latent feature space.
Our approach achieves or matches the state-of-the-art performance in four graph learning tasks.
arXiv Detail & Related papers (2020-10-25T22:39:31Z) - Sub-graph Contrast for Scalable Self-Supervised Graph Representation
Learning [21.0019144298605]
Existing graph neural networks fed with the complete graph data are not scalable due to limited computation and memory costs.
textscSubg-Con is proposed by utilizing the strong correlation between central nodes and their sampled subgraphs to capture regional structure information.
Compared with existing graph representation learning approaches, textscSubg-Con has prominent performance advantages in weaker supervision requirements, model learning scalability, and parallelization.
arXiv Detail & Related papers (2020-09-22T01:58:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.