GROWN+UP: A Graph Representation Of a Webpage Network Utilizing
Pre-training
- URL: http://arxiv.org/abs/2208.02252v1
- Date: Wed, 3 Aug 2022 13:37:27 GMT
- Title: GROWN+UP: A Graph Representation Of a Webpage Network Utilizing
Pre-training
- Authors: Benedict Yeoh and Huijuan Wang
- Abstract summary: We introduce an agnostic deep graph neural network feature extractor that can ingest webpage structures, pre-train self-supervised on massive unlabeled data, and fine-tune to arbitrary tasks on webpages effectually.
We show that our pre-trained model achieves state-of-the-art results using multiple datasets on two very different benchmarks: webpage boilerplate removal and genre classification.
- Score: 0.2538209532048866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large pre-trained neural networks are ubiquitous and critical to the success
of many downstream tasks in natural language processing and computer vision.
However, within the field of web information retrieval, there is a stark
contrast in the lack of similarly flexible and powerful pre-trained models that
can properly parse webpages. Consequently, we believe that common machine
learning tasks like content extraction and information mining from webpages
have low-hanging gains that yet remain untapped.
We aim to close the gap by introducing an agnostic deep graph neural network
feature extractor that can ingest webpage structures, pre-train self-supervised
on massive unlabeled data, and fine-tune to arbitrary tasks on webpages
effectually.
Finally, we show that our pre-trained model achieves state-of-the-art results
using multiple datasets on two very different benchmarks: webpage boilerplate
removal and genre classification, thus lending support to its potential
application in diverse downstream tasks.
Related papers
- Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - GraphPrompt: Unifying Pre-Training and Downstream Tasks for Graph Neural
Networks [16.455234748896157]
GraphPrompt is a novel pre-training and prompting framework on graphs.
It unifies pre-training and downstream tasks into a common task template.
It also employs a learnable prompt to assist a downstream task in locating the most relevant knowledge from the pre-train model.
arXiv Detail & Related papers (2023-02-16T02:51:38Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - Neural Graph Matching for Pre-training Graph Neural Networks [72.32801428070749]
Graph neural networks (GNNs) have been shown powerful capacity at modeling structural data.
We present a novel Graph Matching based GNN Pre-Training framework, called GMPT.
The proposed method can be applied to fully self-supervised pre-training and coarse-grained supervised pre-training.
arXiv Detail & Related papers (2022-03-03T09:53:53Z) - A Variational Graph Autoencoder for Manipulation Action Recognition and
Prediction [1.1816942730023883]
We introduce a deep graph autoencoder to jointly learn recognition and prediction of manipulation tasks from symbolic scene graphs.
Our network has a variational autoencoder structure with two branches: one for identifying the input graph type and one for predicting the future graphs.
We benchmark our new model against different state-of-the-art methods on two different datasets, MANIAC and MSRC-9, and show that our proposed model can achieve better performance.
arXiv Detail & Related papers (2021-10-25T21:40:42Z) - Temporal Graph Network Embedding with Causal Anonymous Walks
Representations [54.05212871508062]
We propose a novel approach for dynamic network representation learning based on Temporal Graph Network.
For evaluation, we provide a benchmark pipeline for the evaluation of temporal network embeddings.
We show the applicability and superior performance of our model in the real-world downstream graph machine learning task provided by one of the top European banks.
arXiv Detail & Related papers (2021-08-19T15:39:52Z) - Topological Uncertainty: Monitoring trained neural networks through
persistence of activation graphs [0.9786690381850356]
In industrial applications, data coming from an open-world setting might widely differ from the benchmark datasets on which a network was trained.
We develop a method to monitor trained neural networks based on the topological properties of their activation graphs.
arXiv Detail & Related papers (2021-05-07T14:16:03Z) - Graph-Based Neural Network Models with Multiple Self-Supervised
Auxiliary Tasks [79.28094304325116]
Graph Convolutional Networks are among the most promising approaches for capturing relationships among structured data points.
We propose three novel self-supervised auxiliary tasks to train graph-based neural network models in a multi-task fashion.
arXiv Detail & Related papers (2020-11-14T11:09:51Z) - GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training [62.73470368851127]
Graph representation learning has emerged as a powerful technique for addressing real-world problems.
We design Graph Contrastive Coding -- a self-supervised graph neural network pre-training framework.
We conduct experiments on three graph learning tasks and ten graph datasets.
arXiv Detail & Related papers (2020-06-17T16:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.