Semi-Supervised Graph Imbalanced Regression
- URL: http://arxiv.org/abs/2305.12087v1
- Date: Sat, 20 May 2023 04:11:00 GMT
- Title: Semi-Supervised Graph Imbalanced Regression
- Authors: Gang Liu, Tong Zhao, Eric Inae, Tengfei Luo, Meng Jiang
- Abstract summary: We propose a semi-supervised framework to progressively balance training data and reduce model bias via self-training.
Results demonstrate that the proposed framework significantly reduces the error of predicted graph properties.
- Score: 17.733488328772943
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data imbalance is easily found in annotated data when the observations of
certain continuous label values are difficult to collect for regression tasks.
When they come to molecule and polymer property predictions, the annotated
graph datasets are often small because labeling them requires expensive
equipment and effort. To address the lack of examples of rare label values in
graph regression tasks, we propose a semi-supervised framework to progressively
balance training data and reduce model bias via self-training. The training
data balance is achieved by (1) pseudo-labeling more graphs for
under-represented labels with a novel regression confidence measurement and (2)
augmenting graph examples in latent space for remaining rare labels after data
balancing with pseudo-labels. The former is to identify quality examples from
unlabeled data whose labels are confidently predicted and sample a subset of
them with a reverse distribution from the imbalanced annotated data. The latter
collaborates with the former to target a perfect balance using a novel
label-anchored mixup algorithm. We perform experiments in seven regression
tasks on graph datasets. Results demonstrate that the proposed framework
significantly reduces the error of predicted graph properties, especially in
under-represented label areas.
Related papers
- Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - How Does Pseudo-Labeling Affect the Generalization Error of the
Semi-Supervised Gibbs Algorithm? [73.80001705134147]
We provide an exact characterization of the expected generalization error (gen-error) for semi-supervised learning (SSL) with pseudo-labeling via the Gibbs algorithm.
The gen-error is expressed in terms of the symmetrized KL information between the output hypothesis, the pseudo-labeled dataset, and the labeled dataset.
arXiv Detail & Related papers (2022-10-15T04:11:56Z) - Debiased Learning from Naturally Imbalanced Pseudo-Labels for Zero-Shot
and Semi-Supervised Learning [27.770473405635585]
This work studies the bias issue of pseudo-labeling, a natural phenomenon that widely occurs but often overlooked by prior research.
We observe heavy long-tailed pseudo-labels when a semi-supervised learning model FixMatch predicts labels on the unlabeled set even though the unlabeled data is curated to be balanced.
Without intervention, the training model inherits the bias from the pseudo-labels and end up being sub-optimal.
arXiv Detail & Related papers (2022-01-05T07:40:24Z) - Recovering the Unbiased Scene Graphs from the Biased Ones [99.24441932582195]
We show that due to the missing labels, scene graph generation (SGG) can be viewed as a "Learning from Positive and Unlabeled data" (PU learning) problem.
We propose Dynamic Label Frequency Estimation (DLFE) to take advantage of training-time data augmentation and average over multiple training iterations to introduce more valid examples.
Extensive experiments show that DLFE is more effective in estimating label frequencies than a naive variant of the traditional estimate, and DLFE significantly alleviates the long tail.
arXiv Detail & Related papers (2021-07-05T16:10:41Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - A Study on the Autoregressive and non-Autoregressive Multi-label
Learning [77.11075863067131]
We propose a self-attention based variational encoder-model to extract the label-label and label-feature dependencies jointly.
Our model can therefore be used to predict all labels in parallel while still including both label-label and label-feature dependencies.
arXiv Detail & Related papers (2020-12-03T05:41:44Z) - Finding Friends and Flipping Frenemies: Automatic Paraphrase Dataset
Augmentation Using Graph Theory [21.06607915149245]
We construct a paraphrase graph from the provided sentence pair labels, and create an augmented dataset by directly inferring labels from the original sentence pairs using a transitivity property.
We evaluate our methods on paraphrase models trained using these datasets starting from a pretrained BERT model, and find that the automatically-enhanced training sets result in more accurate models.
arXiv Detail & Related papers (2020-11-03T17:18:03Z) - Handling Missing Data with Graph Representation Learning [62.59831675688714]
We propose GRAPE, a graph-based framework for feature imputation as well as label prediction.
Under GRAPE, the feature imputation is formulated as an edge-level prediction task and the label prediction as a node-level prediction task.
Experimental results on nine benchmark datasets show that GRAPE yields 20% lower mean absolute error for imputation tasks and 10% lower for label prediction tasks.
arXiv Detail & Related papers (2020-10-30T17:59:13Z) - Rethinking the Value of Labels for Improving Class-Imbalanced Learning [20.953282288425118]
Class-imbalanced learning can significantly benefit in both semi-supervised and self-supervised manners.
We argue that imbalanced labels are not useful always.
Our findings highlight the need to rethink the usage of imbalanced labels in realistic long-tailed tasks.
arXiv Detail & Related papers (2020-06-13T01:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.