Understanding the role of importance weighting for deep learning
- URL: http://arxiv.org/abs/2103.15209v1
- Date: Sun, 28 Mar 2021 19:44:47 GMT
- Title: Understanding the role of importance weighting for deep learning
- Authors: Da Xu, Yuting Ye, Chuanwei Ruan
- Abstract summary: Recent paper by Byrd & Lipton raises concern on impact of importance weighting for deep learning models.
We provide formal characterizations and theoretical justifications on the role of importance weighting.
We reveal both the optimization dynamics and generalization performance under deep learning models.
- Score: 13.845232029169617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent paper by Byrd & Lipton (2019), based on empirical observations,
raises a major concern on the impact of importance weighting for the
over-parameterized deep learning models. They observe that as long as the model
can separate the training data, the impact of importance weighting diminishes
as the training proceeds. Nevertheless, there lacks a rigorous characterization
of this phenomenon. In this paper, we provide formal characterizations and
theoretical justifications on the role of importance weighting with respect to
the implicit bias of gradient descent and margin-based learning theory. We
reveal both the optimization dynamics and generalization performance under deep
learning models. Our work not only explains the various novel phenomenons
observed for importance weighting in deep learning, but also extends to the
studies where the weights are being optimized as part of the model, which
applies to a number of topics under active research.
Related papers
- Granularity Matters in Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.
We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.
To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z) - Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors [74.04775677110179]
In-context Learning (ICL) has become the primary method for performing natural language tasks with Large Language Models (LLMs)
In this work, we examine whether this is the result of the aggregation used in corresponding datasets, where trying to combine low-agreement, disparate annotations might lead to annotation artifacts that create detrimental noise in the prompt.
Our results indicate that aggregation is a confounding factor in the modeling of subjective tasks, and advocate focusing on modeling individuals instead.
arXiv Detail & Related papers (2024-10-17T17:16:00Z) - Understanding the Learning Dynamics of Alignment with Human Feedback [17.420727709895736]
This paper provides an attempt to theoretically analyze the learning dynamics of human preference alignment.
We show how the distribution of preference datasets influences the rate of model updates and provide rigorous guarantees on the training accuracy.
arXiv Detail & Related papers (2024-03-27T16:39:28Z) - Enhancing Generative Class Incremental Learning Performance with Model Forgetting Approach [50.36650300087987]
This study presents a novel approach to Generative Class Incremental Learning (GCIL) by introducing the forgetting mechanism.
We have found that integrating the forgetting mechanisms significantly enhances the models' performance in acquiring new knowledge.
arXiv Detail & Related papers (2024-03-27T05:10:38Z) - Loss Dynamics of Temporal Difference Reinforcement Learning [36.772501199987076]
We study the case learning curves for temporal difference learning of a value function with linear function approximators.
We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function.
arXiv Detail & Related papers (2023-07-10T18:17:50Z) - A Survey on Few-Shot Class-Incremental Learning [11.68962265057818]
Few-shot class-incremental learning (FSCIL) poses a significant challenge for deep neural networks to learn new tasks.
This paper provides a comprehensive survey on FSCIL.
FSCIL has achieved impressive achievements in various fields of computer vision.
arXiv Detail & Related papers (2023-04-17T10:15:08Z) - A Theoretical Study of Inductive Biases in Contrastive Learning [32.98250585760665]
We provide the first theoretical analysis of self-supervised learning that incorporates the effect of inductive biases originating from the model class.
We show that when the model has limited capacity, contrastive representations would recover certain special clustering structures that are compatible with the model architecture.
arXiv Detail & Related papers (2022-11-27T01:53:29Z) - Rethinking Importance Weighting for Transfer Learning [71.81262398144946]
Key assumption in supervised learning is that training and test data follow the same probability distribution.
As real-world machine learning tasks are becoming increasingly complex, novel approaches are explored to cope with such challenges.
arXiv Detail & Related papers (2021-12-19T14:35:25Z) - On the Dynamics of Training Attention Models [30.85940880569692]
We study the dynamics of training a simple attention-based classification model using gradient descent.
We prove that training must converge to attending to the discriminative words when the attention output is classified by a linear classifier.
arXiv Detail & Related papers (2020-11-19T18:55:30Z) - Counterfactual Representation Learning with Balancing Weights [74.67296491574318]
Key to causal inference with observational data is achieving balance in predictive features associated with each treatment type.
Recent literature has explored representation learning to achieve this goal.
We develop an algorithm for flexible, scalable and accurate estimation of causal effects.
arXiv Detail & Related papers (2020-10-23T19:06:03Z) - Usable Information and Evolution of Optimal Representations During
Training [79.38872675793813]
In particular, we find that semantically meaningful but ultimately irrelevant information is encoded in the early transient dynamics of training.
We show these effects on both perceptual decision-making tasks inspired by literature, as well as on standard image classification tasks.
arXiv Detail & Related papers (2020-10-06T03:50:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.