How does unlabeled data improve generalization in self-training? A
one-hidden-layer theoretical analysis
- URL: http://arxiv.org/abs/2201.08514v2
- Date: Tue, 25 Jan 2022 01:03:55 GMT
- Title: How does unlabeled data improve generalization in self-training? A
one-hidden-layer theoretical analysis
- Authors: Shuai Zhang, Meng Wang, Sijia Liu, Pin-Yu Chen, Jinjun Xiong
- Abstract summary: This work establishes the first theoretical analysis for the known iterative self-training paradigm.
We prove the benefits of unlabeled data in both training convergence and generalization ability.
Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.
- Score: 93.37576644429578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-training, a semi-supervised learning algorithm, leverages a large amount
of unlabeled data to improve learning when the labeled data are limited.
Despite empirical successes, its theoretical characterization remains elusive.
To the best of our knowledge, this work establishes the first theoretical
analysis for the known iterative self-training paradigm and proves the benefits
of unlabeled data in both training convergence and generalization ability. To
make our theoretical analysis feasible, we focus on the case of
one-hidden-layer neural networks. However, theoretical understanding of
iterative self-training is non-trivial even for a shallow neural network. One
of the key challenges is that existing neural network landscape analysis built
upon supervised learning no longer holds in the (semi-supervised) self-training
paradigm. We address this challenge and prove that iterative self-training
converges linearly with both convergence rate and generalization accuracy
improved in the order of $1/\sqrt{M}$, where $M$ is the number of unlabeled
samples. Experiments from shallow neural networks to deep neural networks are
also provided to justify the correctness of our established theoretical
insights on self-training.
Related papers
- Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs)
It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - Learning Curves for Sequential Training of Neural Networks:
Self-Knowledge Transfer and Forgetting [9.734033555407406]
We consider neural networks in the neural tangent kernel regime that continually learn target functions from task to task.
We investigate a variant of continual learning where the model learns the same target function in multiple tasks.
Even for the same target, the trained model shows some transfer and forgetting depending on the sample size of each task.
arXiv Detail & Related papers (2021-12-03T00:25:01Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Theoretical Analysis of Self-Training with Deep Networks on Unlabeled
Data [48.4779912667317]
Self-training algorithms have been very successful for learning with unlabeled data using neural networks.
This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning.
arXiv Detail & Related papers (2020-10-07T19:43:55Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z) - Statistical and Algorithmic Insights for Semi-supervised Learning with
Self-training [30.866440916522826]
Self-training is a classical approach in semi-supervised learning.
We show that self-training iterations gracefully improve the model accuracy even if they do get stuck in sub-optimal fixed points.
We then establish a connection between self-training based semi-supervision and the more general problem of learning with heterogenous data.
arXiv Detail & Related papers (2020-06-19T08:09:07Z) - Convergence of End-to-End Training in Deep Unsupervised Contrastive
Learning [3.8073142980733]
Unsupervised contrastive learning has proven to be a powerful method for learning representations from unlabeled data.
This study provides theoretical insights into the practical success of these unsupervised methods.
arXiv Detail & Related papers (2020-02-17T14:35:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.