GNN-LM: Language Modeling based on Global Contexts via GNN
- URL: http://arxiv.org/abs/2110.08743v1
- Date: Sun, 17 Oct 2021 07:18:21 GMT
- Title: GNN-LM: Language Modeling based on Global Contexts via GNN
- Authors: Yuxian Meng, Shi Zong, Xiaoya Li, Xiaofei Sun, Tianwei Zhang, Fei Wu,
Jiwei Li
- Abstract summary: We introduce GNN-LM, which extends the vanilla neural language model (LM) by allowing to reference similar contexts in the entire training corpus.
GNN-LM achieves a new state-of-the-art perplexity of 14.8 on WikiText-103.
- Score: 32.52117529283929
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inspired by the notion that ``{\it to copy is easier than to memorize}``, in
this work, we introduce GNN-LM, which extends the vanilla neural language model
(LM) by allowing to reference similar contexts in the entire training corpus.
We build a directed heterogeneous graph between an input context and its
semantically related neighbors selected from the training corpus, where nodes
are tokens in the input context and retrieved neighbor contexts, and edges
represent connections between nodes. Graph neural networks (GNNs) are
constructed upon the graph to aggregate information from similar contexts to
decode the token. This learning paradigm provides direct access to the
reference contexts and helps improve a model's generalization ability. We
conduct comprehensive experiments to validate the effectiveness of the GNN-LM:
GNN-LM achieves a new state-of-the-art perplexity of 14.8 on WikiText-103 (a
4.5 point improvement over its counterpart of the vanilla LM model) and shows
substantial improvement on One Billion Word and Enwiki8 datasets against strong
baselines. In-depth ablation studies are performed to understand the mechanics
of GNN-LM.
Related papers
- All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks [51.19110891434727]
Large Language Models (LLMs) with pretrained knowledge and powerful semantic comprehension abilities have recently shown a remarkable ability to benefit applications using vision and text data.
E-LLaGNN is a framework with an on-demand LLM service that enriches message passing procedure of graph learning by enhancing a limited fraction of nodes from the graph.
arXiv Detail & Related papers (2024-07-20T22:09:42Z) - LOGIN: A Large Language Model Consulted Graph Neural Network Training Framework [30.54068909225463]
We aim to streamline the GNN design process and leverage the advantages of Large Language Models (LLMs) to improve the performance of GNNs on downstream tasks.
We formulate a new paradigm, coined "LLMs-as-Consultants," which integrates LLMs with GNNs in an interactive manner.
We empirically evaluate the effectiveness of LOGIN on node classification tasks across both homophilic and heterophilic graphs.
arXiv Detail & Related papers (2024-05-22T18:17:20Z) - Harnessing Explanations: LLM-to-LM Interpreter for Enhanced
Text-Attributed Graph Representation Learning [51.90524745663737]
A key innovation is our use of explanations as features, which can be used to boost GNN performance on downstream tasks.
Our method achieves state-of-the-art results on well-established TAG datasets.
Our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on ogbn-arxiv.
arXiv Detail & Related papers (2023-05-31T03:18:03Z) - Efficient and effective training of language and graph neural network
models [36.00479096375565]
We put forth an efficient and effective framework termed language model GNN (LM-GNN) to jointly train large-scale language models and graph neural networks.
The effectiveness in our framework is achieved by applying stage-wise fine-tuning of the BERT model first with heterogenous graph information and then with a GNN model.
We evaluate the LM-GNN framework in different datasets performance and showcase the effectiveness of the proposed approach.
arXiv Detail & Related papers (2022-06-22T00:23:37Z) - Graph Neural Networks for Natural Language Processing: A Survey [64.36633422999905]
We present a comprehensive overview onGraph Neural Networks (GNNs) for Natural Language Processing.
We propose a new taxonomy of GNNs for NLP, which organizes existing research of GNNs for NLP along three axes: graph construction,graph representation learning, and graph based encoder-decoder models.
arXiv Detail & Related papers (2021-06-10T23:59:26Z) - InsertGNN: Can Graph Neural Networks Outperform Humans in TOEFL Sentence
Insertion Problem? [66.70154236519186]
Sentence insertion is a delicate but fundamental NLP problem.
Current approaches in sentence ordering, text coherence, and question answering (QA) are neither suitable nor good at solving it.
We propose InsertGNN, a model that represents the problem as a graph and adopts the graph Neural Network (GNN) to learn the connection between sentences.
arXiv Detail & Related papers (2021-03-28T06:50:31Z) - Enhance Information Propagation for Graph Neural Network by
Heterogeneous Aggregations [7.3136594018091134]
Graph neural networks are emerging as continuation of deep learning success w.r.t. graph data.
We propose to enhance information propagation among GNN layers by combining heterogeneous aggregations.
We empirically validate the effectiveness of HAG-Net on a number of graph classification benchmarks.
arXiv Detail & Related papers (2021-02-08T08:57:56Z) - Overcoming Catastrophic Forgetting in Graph Neural Networks [50.900153089330175]
Catastrophic forgetting refers to the tendency that a neural network "forgets" the previous learned knowledge upon learning new tasks.
We propose a novel scheme dedicated to overcoming this problem and hence strengthen continual learning in graph neural networks (GNNs)
At the heart of our approach is a generic module, termed as topology-aware weight preserving(TWP)
arXiv Detail & Related papers (2020-12-10T22:30:25Z) - Policy-GNN: Aggregation Optimization for Graph Neural Networks [60.50932472042379]
Graph neural networks (GNNs) aim to model the local graph structures and capture the hierarchical patterns by aggregating the information from neighbors.
It is a challenging task to develop an effective aggregation strategy for each node, given complex graphs and sparse features.
We propose Policy-GNN, a meta-policy framework that models the sampling procedure and message passing of GNNs into a combined learning process.
arXiv Detail & Related papers (2020-06-26T17:03:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.