Related papers: lpNTK: Better Generalisation with Less Data via Sample Interaction During Learning

lpNTK: Better Generalisation with Less Data via Sample Interaction During Learning

URL: http://arxiv.org/abs/2401.08808v2
Date: Tue, 14 May 2024 13:14:33 GMT
Title: lpNTK: Better Generalisation with Less Data via Sample Interaction During Learning
Authors: Shangmin Guo, Yi Ren, Stefano V. Albrecht, Kenny Smith,
Abstract summary: We propose a pseudo Neural Tangent Kernel (lpNTK) to take label information into consideration when measuring the interactions between samples. lpNTK helps to understand learning phenomena identified in previous work, specifically the learning difficulty of samples and forgetting events during learning. We show that using lpNTK to identify and remove poisoning training samples does not hurt the generalisation performance of ANNs.
Score: 22.59771349030541
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Although much research has been done on proposing new models or loss functions to improve the generalisation of artificial neural networks (ANNs), less attention has been directed to the impact of the training data on generalisation. In this work, we start from approximating the interaction between samples, i.e. how learning one sample would modify the model's prediction on other samples. Through analysing the terms involved in weight updates in supervised learning, we find that labels influence the interaction between samples. Therefore, we propose the labelled pseudo Neural Tangent Kernel (lpNTK) which takes label information into consideration when measuring the interactions between samples. We first prove that lpNTK asymptotically converges to the empirical neural tangent kernel in terms of the Frobenius norm under certain assumptions. Secondly, we illustrate how lpNTK helps to understand learning phenomena identified in previous work, specifically the learning difficulty of samples and forgetting events during learning. Moreover, we also show that using lpNTK to identify and remove poisoning training samples does not hurt the generalisation performance of ANNs.

Related papers

Neighborhood Sampling Does Not Learn the Same Graph Neural Network [7.312174450290588]
Neighborhood sampling is an important ingredient in the training of large-scale graph neural networks.<n>We study several established neighborhood sampling approaches and the corresponding posterior GP.
arXiv Detail & Related papers (2025-09-26T19:28:13Z)
Deep learning with missing data [3.829599191332801]
We propose Pattern Embedded Neural Networks (PENNs), which can be applied in conjunction with any existing imputation technique. In addition to a neural network trained on the imputed data, PENNs pass the vectors of observation indicators through a second neural network to provide a compact representation. The outputs are then combined in a third neural network to produce final predictions.
arXiv Detail & Related papers (2025-04-21T18:57:36Z)
Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts. We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep. We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z)
The Missing Margin: How Sample Corruption Affects Distance to the Boundary in ANNs [2.65558931169264]
We show that some types of training samples are modelled with consistently small margins while affecting generalization in different ways. We support our findings with an analysis of fully-connected networks trained on noise-corrupted MNIST data, as well as convolutional networks trained on noise-corrupted CIFAR10 data.
arXiv Detail & Related papers (2023-02-14T09:25:50Z)
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity [71.11795737362459]
ViTs with self-attention modules have recently achieved great empirical success in many tasks. However, theoretical learning generalization analysis is mostly noisy and elusive. This paper provides the first theoretical analysis of a shallow ViT for a classification task.
arXiv Detail & Related papers (2023-02-12T22:12:35Z)
Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks [89.28881869440433]
This paper provides the first theoretical characterization of joint edge-model sparse learning for graph neural networks (GNNs) It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy.
arXiv Detail & Related papers (2023-02-06T16:54:20Z)
Look beyond labels: Incorporating functional summary information in Bayesian neural networks [11.874130244353253]
We present a simple approach to incorporate summary information about the predicted probability. The available summary information is incorporated as augmented data and modeled with a Dirichlet process. We show how the method can inform the model about task difficulty or class imbalance.
arXiv Detail & Related papers (2022-07-04T07:06:45Z)
Probing Representation Forgetting in Supervised and Unsupervised Continual Learning [14.462797749666992]
Catastrophic forgetting is associated with an abrupt loss of knowledge previously learned by a model. We show that representation forgetting can lead to new insights on the effect of model capacity and loss function used in continual learning.
arXiv Detail & Related papers (2022-03-24T23:06:08Z)
With Greater Distance Comes Worse Performance: On the Perspective of Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning. Early layers generally learn representations relevant to performance on both training data and testing data. Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z)
Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning. Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z)
Uniform Convergence, Adversarial Spheres and a Simple Remedy [40.44709296304123]
Previous work has cast doubt on the general framework of uniform convergence and its ability to explain generalization in neural networks. We provide an extensive theoretical investigation of the previously studied data setting through the lens of infinitely-wide models. We prove that the Neural Tangent Kernel (NTK) also suffers from the same phenomenon and we uncover its origin.
arXiv Detail & Related papers (2021-05-07T20:23:01Z)
Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency) Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z)
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks. We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task. Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.