Combining Diverse Feature Priors
- URL: http://arxiv.org/abs/2110.08220v1
- Date: Fri, 15 Oct 2021 17:31:10 GMT
- Title: Combining Diverse Feature Priors
- Authors: Saachi Jain, Dimitris Tsipras, Aleksander Madry
- Abstract summary: We show that models trained with diverse sets of feature priors have less overlapping failure modes.
We also demonstrate that jointly training such models on additional (unlabeled) data allows them to correct each other's mistakes.
- Score: 90.74601233745047
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To improve model generalization, model designers often restrict the features
that their models use, either implicitly or explicitly. In this work, we
explore the design space of leveraging such feature priors by viewing them as
distinct perspectives on the data. Specifically, we find that models trained
with diverse sets of feature priors have less overlapping failure modes, and
can thus be combined more effectively. Moreover, we demonstrate that jointly
training such models on additional (unlabeled) data allows them to correct each
other's mistakes, which, in turn, leads to better generalization and resilience
to spurious correlations. Code available at
https://github.com/MadryLab/copriors.
Related papers
- Mitigating Biases with Diverse Ensembles and Diffusion Models [99.6100669122048]
We propose an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs)
We show that DPMs can generate images with novel feature combinations, even when trained on samples displaying correlated input features.
We show that DPM-guided diversification is sufficient to remove dependence on primary shortcut cues, without a need for additional supervised signals.
arXiv Detail & Related papers (2023-11-23T15:47:33Z) - A Federated Data Fusion-Based Prognostic Model for Applications with Multi-Stream Incomplete Signals [1.2277343096128712]
This article proposes a federated prognostic model that allows multiple users to jointly construct a failure time prediction model.
Numerical studies indicate that the performance of the proposed model is the same as that of classic non-federated prognostic models.
arXiv Detail & Related papers (2023-11-13T17:08:34Z) - Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts
in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs)
We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z) - No One Representation to Rule Them All: Overlapping Features of Training
Methods [12.58238785151714]
High-performing models tend to make similar predictions regardless of training methodology.
Recent work has made very different training techniques, such as large-scale contrastive learning, yield competitively-high accuracy.
We show these models specialize in generalization of the data, leading to higher ensemble performance.
arXiv Detail & Related papers (2021-10-20T21:29:49Z) - Self-Attention Between Datapoints: Going Beyond Individual Input-Output
Pairs in Deep Learning [36.047444794544425]
We introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time.
Our approach uses self-attention to reason about relationships between datapoints explicitly.
Unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction.
arXiv Detail & Related papers (2021-06-04T16:30:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.