Related papers: Equi-Tuning: Group Equivariant Fine-Tuning of Pretrained Models

Equi-Tuning: Group Equivariant Fine-Tuning of Pretrained Models

URL: http://arxiv.org/abs/2210.06475v1
Date: Thu, 13 Oct 2022 08:45:23 GMT
Title: Equi-Tuning: Group Equivariant Fine-Tuning of Pretrained Models
Authors: Sourya Basu, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy, Vijil Chenthamarakshan, Kush R. Varshney, Lav R. Varshney, and Payel Das
Abstract summary: We introduce equi-tuning, a novel fine-tuning method that transforms (potentially non-equivariant) pretrained models into group equivariant models. We provide applications of equi-tuning on three different tasks: image classification, compositional generalization in language, and fairness in natural language generation.
Score: 56.88106830869487
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce equi-tuning, a novel fine-tuning method that transforms (potentially non-equivariant) pretrained models into group equivariant models while incurring minimum $L_2$ loss between the feature representations of the pretrained and the equivariant models. Large pretrained models can be equi-tuned for different groups to satisfy the needs of various downstream tasks. Equi-tuned models benefit from both group equivariance as an inductive bias and semantic priors from pretrained models. We provide applications of equi-tuning on three different tasks: image classification, compositional generalization in language, and fairness in natural language generation (NLG). We also provide a novel group-theoretic definition for fairness in NLG. The effectiveness of this definition is shown by testing it against a standard empirical method of fairness in NLG. We provide experimental results for equi-tuning using a variety of pretrained models: Alexnet, Resnet, VGG, and Densenet for image classification; RNNs, GRUs, and LSTMs for compositional generalization; and GPT2 for fairness in NLG. We test these models on benchmark datasets across all considered tasks to show the generality and effectiveness of the proposed method.

Related papers

Self-Supervised Learning for Neural Topic Models with Variance-Invariance-Covariance Regularization [12.784397404903142]
We propose a self-supervised neural topic model (NTM) that combines the power of NTMs and regularized self-supervised learning methods to improve performance. NTMs use neural networks to learn latent topics hidden behind the words in documents. Our models outperformed baselines and state-of-the-art models both quantitatively and qualitatively.
arXiv Detail & Related papers (2025-02-14T06:47:37Z)
Adaptive Transfer Clustering: A Unified Framework [2.3144964550307496]
We propose an adaptive transfer clustering (ATC) algorithm that automatically leverages the commonality in the presence of unknown discrepancy. It applies to a broad class of statistical models including Gaussian mixture models, block models, and latent class models.
arXiv Detail & Related papers (2024-10-28T17:57:06Z)
Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models [83.02797560769285]
Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data. Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts. We propose Task Groupings Regularization, a novel approach that benefits from model heterogeneity by grouping and aligning conflicting tasks.
arXiv Detail & Related papers (2024-05-26T13:11:55Z)
Efficient Model-Agnostic Multi-Group Equivariant Networks [18.986283562123713]
We provide efficient model-agnostic equivariant designs for two related problems. One is a network with multiple inputs each with potentially different groups acting on them, and another is a single input but the group acting on it is a large product group. We find equivariant models are robust to such transformations and perform competitively otherwise.
arXiv Detail & Related papers (2023-10-14T22:24:26Z)
Universal Semi-supervised Model Adaptation via Collaborative Consistency Training [92.52892510093037]
We introduce a realistic and challenging domain adaptation problem called Universal Semi-supervised Model Adaptation (USMA) We propose a collaborative consistency training framework that regularizes the prediction consistency between two models. Experimental results demonstrate the effectiveness of our method on several benchmark datasets.
arXiv Detail & Related papers (2023-07-07T08:19:40Z)
Representer Point Selection for Explaining Regularized High-dimensional Models [105.75758452952357]
We introduce a class of sample-based explanations we term high-dimensional representers. Our workhorse is a novel representer theorem for general regularized high-dimensional models. We study the empirical performance of our proposed methods on three real-world binary classification datasets and two recommender system datasets.
arXiv Detail & Related papers (2023-05-31T16:23:58Z)
On the Compositional Generalization Gap of In-Context Learning [73.09193595292233]
We look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning. We evaluate four model families, OPT, BLOOM, CodeGen and Codex on three semantic parsing datasets.
arXiv Detail & Related papers (2022-11-15T19:56:37Z)
Design equivariant neural networks for 3D point cloud [0.0]
This work seeks to improve the generalization and robustness of existing neural networks for 3D point clouds. The main challenge when designing equivariant models for point clouds is how to trade-off the performance of the model and the complexity. The proposed procedure is general and forms a fundamental approach to group equivariant neural networks.
arXiv Detail & Related papers (2022-05-02T02:57:13Z)
Siamese Neural Network with Joint Bayesian Model Structure for Speaker Verification [54.96267179988487]
We propose a novel Siamese neural network (SiamNN) for speaker verification. Joint distribution of samples is first formulated based on a joint Bayesian (JB) based generative model. We further train the model parameters with the pair-wised samples as a binary discrimination task for speaker verification.
arXiv Detail & Related papers (2021-04-07T09:17:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.