ScatterSample: Diversified Label Sampling for Data Efficient Graph
Neural Network Learning
- URL: http://arxiv.org/abs/2206.04255v1
- Date: Thu, 9 Jun 2022 04:05:02 GMT
- Title: ScatterSample: Diversified Label Sampling for Data Efficient Graph
Neural Network Learning
- Authors: Zhenwei Dai, Vasileios Ioannidis, Soji Adeshina, Zak Jost, Christos
Faloutsos, George Karypis
- Abstract summary: In some applications where graph neural network (GNN) training is expensive, labeling new instances is expensive.
We develop a data-efficient active sampling framework, ScatterSample, to train GNNs under an active learning setting.
Our experiments on five datasets show that ScatterSample significantly outperforms the other GNN active learning baselines.
- Score: 22.278779277115234
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: What target labels are most effective for graph neural network (GNN)
training? In some applications where GNNs excel-like drug design or fraud
detection, labeling new instances is expensive. We develop a data-efficient
active sampling framework, ScatterSample, to train GNNs under an active
learning setting. ScatterSample employs a sampling module termed
DiverseUncertainty to collect instances with large uncertainty from different
regions of the sample space for labeling. To ensure diversification of the
selected nodes, DiverseUncertainty clusters the high uncertainty nodes and
selects the representative nodes from each cluster. Our ScatterSample algorithm
is further supported by rigorous theoretical analysis demonstrating its
advantage compared to standard active sampling methods that aim to simply
maximize the uncertainty and not diversify the samples. In particular, we show
that ScatterSample is able to efficiently reduce the model uncertainty over the
whole sample space. Our experiments on five datasets show that ScatterSample
significantly outperforms the other GNN active learning baselines, specifically
it reduces the sampling cost by up to 50% while achieving the same test
accuracy.
Related papers
- How Low Can You Go? Surfacing Prototypical In-Distribution Samples for Unsupervised Anomaly Detection [48.30283806131551]
We show that UAD with extremely few training samples can already match -- and in some cases even surpass -- the performance of training with the whole training dataset.
We propose an unsupervised method to reliably identify prototypical samples to further boost UAD performance.
arXiv Detail & Related papers (2023-12-06T15:30:47Z) - PCB-RandNet: Rethinking Random Sampling for LIDAR Semantic Segmentation
in Autonomous Driving Scene [15.516687293651795]
We propose a new Polar Cylinder Balanced Random Sampling method for semantic segmentation of large-scale LiDAR point clouds.
In addition, a sampling consistency loss is introduced to further improve the segmentation performance and reduce the model's variance under different sampling methods.
Our approach produces excellent performance on both SemanticKITTI and SemanticPOSS benchmarks, achieving a 2.8% and 4.0% improvement, respectively.
arXiv Detail & Related papers (2022-09-28T02:59:36Z) - Labeling-Free Comparison Testing of Deep Learning Models [28.47632100019289]
We propose a labeling-free comparison testing approach to overcome the limitations of labeling effort and sampling randomness.
Our approach outperforms the baseline methods by up to 0.74 and 0.53 on Spearman's correlation and Kendall's $tau$, regardless of the dataset and distribution shift.
arXiv Detail & Related papers (2022-04-08T10:55:45Z) - Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated
Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample.
We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z) - Active Learning for Deep Visual Tracking [51.5063680734122]
Convolutional neural networks (CNNs) have been successfully applied to the single target tracking task in recent years.
In this paper, we propose an active learning method for deep visual tracking, which selects and annotates the unlabeled samples to train the deep CNNs model.
Under the guidance of active learning, the tracker based on the trained deep CNNs model can achieve competitive tracking performance while reducing the labeling cost.
arXiv Detail & Related papers (2021-10-17T11:47:56Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.
Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples.
We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z) - Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs)
These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.