Equi-Tuning: Group Equivariant Fine-Tuning of Pretrained Models
- URL: http://arxiv.org/abs/2210.06475v1
- Date: Thu, 13 Oct 2022 08:45:23 GMT
- Title: Equi-Tuning: Group Equivariant Fine-Tuning of Pretrained Models
- Authors: Sourya Basu, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy, Vijil
Chenthamarakshan, Kush R. Varshney, Lav R. Varshney, and Payel Das
- Abstract summary: We introduce equi-tuning, a novel fine-tuning method that transforms (potentially non-equivariant) pretrained models into group equivariant models.
We provide applications of equi-tuning on three different tasks: image classification, compositional generalization in language, and fairness in natural language generation.
- Score: 56.88106830869487
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce equi-tuning, a novel fine-tuning method that transforms
(potentially non-equivariant) pretrained models into group equivariant models
while incurring minimum $L_2$ loss between the feature representations of the
pretrained and the equivariant models. Large pretrained models can be
equi-tuned for different groups to satisfy the needs of various downstream
tasks. Equi-tuned models benefit from both group equivariance as an inductive
bias and semantic priors from pretrained models. We provide applications of
equi-tuning on three different tasks: image classification, compositional
generalization in language, and fairness in natural language generation (NLG).
We also provide a novel group-theoretic definition for fairness in NLG. The
effectiveness of this definition is shown by testing it against a standard
empirical method of fairness in NLG. We provide experimental results for
equi-tuning using a variety of pretrained models: Alexnet, Resnet, VGG, and
Densenet for image classification; RNNs, GRUs, and LSTMs for compositional
generalization; and GPT2 for fairness in NLG. We test these models on benchmark
datasets across all considered tasks to show the generality and effectiveness
of the proposed method.
Related papers
- Adaptive Transfer Clustering: A Unified Framework [2.3144964550307496]
We propose an adaptive transfer clustering (ATC) algorithm that automatically leverages the commonality in the presence of unknown discrepancy.
It applies to a broad class of statistical models including Gaussian mixture models, block models, and latent class models.
arXiv Detail & Related papers (2024-10-28T17:57:06Z) - Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models [83.02797560769285]
Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data.
Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts.
We propose Task Groupings Regularization, a novel approach that benefits from model heterogeneity by grouping and aligning conflicting tasks.
arXiv Detail & Related papers (2024-05-26T13:11:55Z) - Efficient Model-Agnostic Multi-Group Equivariant Networks [18.986283562123713]
We provide efficient model-agnostic equivariant designs for two related problems.
One is a network with multiple inputs each with potentially different groups acting on them, and another is a single input but the group acting on it is a large product group.
We find equivariant models are robust to such transformations and perform competitively otherwise.
arXiv Detail & Related papers (2023-10-14T22:24:26Z) - Universal Semi-supervised Model Adaptation via Collaborative Consistency
Training [92.52892510093037]
We introduce a realistic and challenging domain adaptation problem called Universal Semi-supervised Model Adaptation (USMA)
We propose a collaborative consistency training framework that regularizes the prediction consistency between two models.
Experimental results demonstrate the effectiveness of our method on several benchmark datasets.
arXiv Detail & Related papers (2023-07-07T08:19:40Z) - Representer Point Selection for Explaining Regularized High-dimensional
Models [105.75758452952357]
We introduce a class of sample-based explanations we term high-dimensional representers.
Our workhorse is a novel representer theorem for general regularized high-dimensional models.
We study the empirical performance of our proposed methods on three real-world binary classification datasets and two recommender system datasets.
arXiv Detail & Related papers (2023-05-31T16:23:58Z) - On the Compositional Generalization Gap of In-Context Learning [73.09193595292233]
We look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning.
We evaluate four model families, OPT, BLOOM, CodeGen and Codex on three semantic parsing datasets.
arXiv Detail & Related papers (2022-11-15T19:56:37Z) - Design equivariant neural networks for 3D point cloud [0.0]
This work seeks to improve the generalization and robustness of existing neural networks for 3D point clouds.
The main challenge when designing equivariant models for point clouds is how to trade-off the performance of the model and the complexity.
The proposed procedure is general and forms a fundamental approach to group equivariant neural networks.
arXiv Detail & Related papers (2022-05-02T02:57:13Z) - Siamese Neural Network with Joint Bayesian Model Structure for Speaker
Verification [54.96267179988487]
We propose a novel Siamese neural network (SiamNN) for speaker verification.
Joint distribution of samples is first formulated based on a joint Bayesian (JB) based generative model.
We further train the model parameters with the pair-wised samples as a binary discrimination task for speaker verification.
arXiv Detail & Related papers (2021-04-07T09:17:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.