GROWN+UP: A Graph Representation Of a Webpage Network Utilizing
Pre-training
- URL: http://arxiv.org/abs/2208.02252v1
- Date: Wed, 3 Aug 2022 13:37:27 GMT
- Title: GROWN+UP: A Graph Representation Of a Webpage Network Utilizing
Pre-training
- Authors: Benedict Yeoh and Huijuan Wang
- Abstract summary: We introduce an agnostic deep graph neural network feature extractor that can ingest webpage structures, pre-train self-supervised on massive unlabeled data, and fine-tune to arbitrary tasks on webpages effectually.
We show that our pre-trained model achieves state-of-the-art results using multiple datasets on two very different benchmarks: webpage boilerplate removal and genre classification.
- Score: 0.2538209532048866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large pre-trained neural networks are ubiquitous and critical to the success
of many downstream tasks in natural language processing and computer vision.
However, within the field of web information retrieval, there is a stark
contrast in the lack of similarly flexible and powerful pre-trained models that
can properly parse webpages. Consequently, we believe that common machine
learning tasks like content extraction and information mining from webpages
have low-hanging gains that yet remain untapped.
We aim to close the gap by introducing an agnostic deep graph neural network
feature extractor that can ingest webpage structures, pre-train self-supervised
on massive unlabeled data, and fine-tune to arbitrary tasks on webpages
effectually.
Finally, we show that our pre-trained model achieves state-of-the-art results
using multiple datasets on two very different benchmarks: webpage boilerplate
removal and genre classification, thus lending support to its potential
application in diverse downstream tasks.
Related papers
- Dual-level Mixup for Graph Few-shot Learning with Fewer Tasks [23.07584018576066]
We propose a SiMple yet effectIve approach for graph few-shot Learning with fEwer tasks, named SMILE.
We introduce a dual-level mixup strategy, encompassing both within-task and across-task mixup, to simultaneously enrich the available nodes and tasks in meta-learning.
Empirically, SMILE consistently outperforms other competitive models by a large margin across all evaluated datasets with in-domain and cross-domain settings.
arXiv Detail & Related papers (2025-02-19T23:59:05Z) - Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees [50.78679002846741]
We introduce a novel approach for learning cross-task generalities in graphs.
We propose task-trees as basic learning instances to align task spaces on graphs.
Our findings indicate that when a graph neural network is pretrained on diverse task-trees, it acquires transferable knowledge.
arXiv Detail & Related papers (2024-12-21T02:07:43Z) - Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision? [62.12375949429938]
Building transferable Graph Neural Networks (GNNs) with CLIP pipeline is challenging because of three fundamental issues.
We leverage multi-modal prompt learning to effectively adapt pre-trained GNN to downstream tasks and data.
Our new paradigm embeds the graphs directly in the same space as the Large Language Models (LLMs) by learning both graph prompts and text prompts simultaneously.
arXiv Detail & Related papers (2024-12-11T08:03:35Z) - Dynamic Graph Message Passing Networks for Visual Recognition [112.49513303433606]
Modelling long-range dependencies is critical for scene understanding tasks in computer vision.
A fully-connected graph is beneficial for such modelling, but its computational overhead is prohibitive.
We propose a dynamic graph message passing network, that significantly reduces the computational complexity.
arXiv Detail & Related papers (2022-09-20T14:41:37Z) - A Variational Graph Autoencoder for Manipulation Action Recognition and
Prediction [1.1816942730023883]
We introduce a deep graph autoencoder to jointly learn recognition and prediction of manipulation tasks from symbolic scene graphs.
Our network has a variational autoencoder structure with two branches: one for identifying the input graph type and one for predicting the future graphs.
We benchmark our new model against different state-of-the-art methods on two different datasets, MANIAC and MSRC-9, and show that our proposed model can achieve better performance.
arXiv Detail & Related papers (2021-10-25T21:40:42Z) - Temporal Graph Network Embedding with Causal Anonymous Walks
Representations [54.05212871508062]
We propose a novel approach for dynamic network representation learning based on Temporal Graph Network.
For evaluation, we provide a benchmark pipeline for the evaluation of temporal network embeddings.
We show the applicability and superior performance of our model in the real-world downstream graph machine learning task provided by one of the top European banks.
arXiv Detail & Related papers (2021-08-19T15:39:52Z) - Topological Uncertainty: Monitoring trained neural networks through
persistence of activation graphs [0.9786690381850356]
In industrial applications, data coming from an open-world setting might widely differ from the benchmark datasets on which a network was trained.
We develop a method to monitor trained neural networks based on the topological properties of their activation graphs.
arXiv Detail & Related papers (2021-05-07T14:16:03Z) - Graph-Based Neural Network Models with Multiple Self-Supervised
Auxiliary Tasks [79.28094304325116]
Graph Convolutional Networks are among the most promising approaches for capturing relationships among structured data points.
We propose three novel self-supervised auxiliary tasks to train graph-based neural network models in a multi-task fashion.
arXiv Detail & Related papers (2020-11-14T11:09:51Z) - GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training [62.73470368851127]
Graph representation learning has emerged as a powerful technique for addressing real-world problems.
We design Graph Contrastive Coding -- a self-supervised graph neural network pre-training framework.
We conduct experiments on three graph learning tasks and ten graph datasets.
arXiv Detail & Related papers (2020-06-17T16:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.