A Short Survey on Importance Weighting for Machine Learning
- URL: http://arxiv.org/abs/2403.10175v2
- Date: Tue, 14 May 2024 05:58:19 GMT
- Title: A Short Survey on Importance Weighting for Machine Learning
- Authors: Masanari Kimura, Hideitsu Hino,
- Abstract summary: It is known that supervised learning under an assumption about the difference between the training and test distributions, called distribution shift, can guarantee statistically desirable properties through importance weighting by their density ratio.
This survey summarizes the broad applications of importance weighting in machine learning and related research.
- Score: 3.27651593877935
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Importance weighting is a fundamental procedure in statistics and machine learning that weights the objective function or probability distribution based on the importance of the instance in some sense. The simplicity and usefulness of the idea has led to many applications of importance weighting. For example, it is known that supervised learning under an assumption about the difference between the training and test distributions, called distribution shift, can guarantee statistically desirable properties through importance weighting by their density ratio. This survey summarizes the broad applications of importance weighting in machine learning and related research.
Related papers
- Granularity Matters in Long-Tail Learning [62.30734737735273]
We offer a novel perspective on long-tail learning, inspired by an observation: datasets with finer granularity tend to be less affected by data imbalance.
We introduce open-set auxiliary classes that are visually similar to existing ones, aiming to enhance representation learning for both head and tail classes.
To prevent the overwhelming presence of auxiliary classes from disrupting training, we introduce a neighbor-silencing loss.
arXiv Detail & Related papers (2024-10-21T13:06:21Z) - Mutual Information Multinomial Estimation [53.58005108981247]
Estimating mutual information (MI) is a fundamental yet challenging task in data science and machine learning.
Our main discovery is that a preliminary estimate of the data distribution can dramatically help estimate.
Experiments on diverse tasks including non-Gaussian synthetic problems with known ground-truth and real-world applications demonstrate the advantages of our method.
arXiv Detail & Related papers (2024-08-18T06:27:30Z) - Value-aware Importance Weighting for Off-policy Reinforcement Learning [11.3798693158017]
Importance sampling is a central idea underlying off-policy prediction in reinforcement learning.
In this work, we consider a broader class of importance weights to correct samples in off-policy learning.
We derive how such weights can be computed, and detail key properties of the resulting importance weights.
arXiv Detail & Related papers (2023-06-27T17:05:22Z) - Rethinking Importance Weighting for Transfer Learning [71.81262398144946]
Key assumption in supervised learning is that training and test data follow the same probability distribution.
As real-world machine learning tasks are becoming increasingly complex, novel approaches are explored to cope with such challenges.
arXiv Detail & Related papers (2021-12-19T14:35:25Z) - Understanding the role of importance weighting for deep learning [13.845232029169617]
Recent paper by Byrd & Lipton raises concern on impact of importance weighting for deep learning models.
We provide formal characterizations and theoretical justifications on the role of importance weighting.
We reveal both the optimization dynamics and generalization performance under deep learning models.
arXiv Detail & Related papers (2021-03-28T19:44:47Z) - ReMP: Rectified Metric Propagation for Few-Shot Learning [67.96021109377809]
A rectified metric space is learned to maintain the metric consistency from training to testing.
Numerous analyses indicate that a simple modification of the objective can yield substantial performance gains.
The proposed ReMP is effective and efficient, and outperforms the state of the arts on various standard few-shot learning datasets.
arXiv Detail & Related papers (2020-12-02T00:07:53Z) - Counterfactual Representation Learning with Balancing Weights [74.67296491574318]
Key to causal inference with observational data is achieving balance in predictive features associated with each treatment type.
Recent literature has explored representation learning to achieve this goal.
We develop an algorithm for flexible, scalable and accurate estimation of causal effects.
arXiv Detail & Related papers (2020-10-23T19:06:03Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Towards a More Reliable Interpretation of Machine Learning Outputs for
Safety-Critical Systems using Feature Importance Fusion [0.0]
We introduce a novel fusion metric and compare it to the state-of-the-art.
Our approach is tested on synthetic data, where the ground truth is known.
Results show that our feature importance ensemble Framework overall produces 15% less feature importance error compared to existing methods.
arXiv Detail & Related papers (2020-09-11T15:51:52Z) - Understanding Global Feature Contributions With Additive Importance
Measures [14.50261153230204]
We explore the perspective of defining feature importance through the predictive power associated with each feature.
We introduce two notions of predictive power (model-based and universal) and formalize this approach with a framework of additive importance measures.
We then propose SAGE, a model-agnostic method that quantifies predictive power while accounting for feature interactions.
arXiv Detail & Related papers (2020-04-01T19:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.