Combining Diverse Feature Priors
- URL: http://arxiv.org/abs/2110.08220v1
- Date: Fri, 15 Oct 2021 17:31:10 GMT
- Title: Combining Diverse Feature Priors
- Authors: Saachi Jain, Dimitris Tsipras, Aleksander Madry
- Abstract summary: We show that models trained with diverse sets of feature priors have less overlapping failure modes.
We also demonstrate that jointly training such models on additional (unlabeled) data allows them to correct each other's mistakes.
- Score: 90.74601233745047
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To improve model generalization, model designers often restrict the features
that their models use, either implicitly or explicitly. In this work, we
explore the design space of leveraging such feature priors by viewing them as
distinct perspectives on the data. Specifically, we find that models trained
with diverse sets of feature priors have less overlapping failure modes, and
can thus be combined more effectively. Moreover, we demonstrate that jointly
training such models on additional (unlabeled) data allows them to correct each
other's mistakes, which, in turn, leads to better generalization and resilience
to spurious correlations. Code available at
https://github.com/MadryLab/copriors.
Related papers
- Offline Model-Based Optimization by Learning to Rank [26.21886715050762]
We argue that regression models trained with mean squared error (MSE) are not well-aligned with the primary goal of offline model-based optimization.
We propose learning a ranking-based model that leverages learning to rank techniques to prioritize promising designs based on their relative scores.
arXiv Detail & Related papers (2024-10-15T11:15:03Z) - A Federated Data Fusion-Based Prognostic Model for Applications with Multi-Stream Incomplete Signals [1.2277343096128712]
This article proposes a federated prognostic model that allows multiple users to jointly construct a failure time prediction model.
Numerical studies indicate that the performance of the proposed model is the same as that of classic non-federated prognostic models.
arXiv Detail & Related papers (2023-11-13T17:08:34Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts
in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs)
We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Investigating Ensemble Methods for Model Robustness Improvement of Text
Classifiers [66.36045164286854]
We analyze a set of existing bias features and demonstrate there is no single model that works best for all the cases.
By choosing an appropriate bias model, we can obtain a better robustness result than baselines with a more sophisticated model design.
arXiv Detail & Related papers (2022-10-28T17:52:10Z) - Self-Attention Between Datapoints: Going Beyond Individual Input-Output
Pairs in Deep Learning [36.047444794544425]
We introduce a general-purpose deep learning architecture that takes as input the entire dataset instead of processing one datapoint at a time.
Our approach uses self-attention to reason about relationships between datapoints explicitly.
Unlike conventional non-parametric models, we let the model learn end-to-end from the data how to make use of other datapoints for prediction.
arXiv Detail & Related papers (2021-06-04T16:30:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.