Related papers: Rank-One Editing of Encoder-Decoder Models

Rank-One Editing of Encoder-Decoder Models

URL: http://arxiv.org/abs/2211.13317v1
Date: Wed, 23 Nov 2022 21:34:57 GMT
Title: Rank-One Editing of Encoder-Decoder Models
Authors: Vikas Raunak and Arul Menezes
Abstract summary: rank-one editing is a direct intervention method for behavior deletion requests in encoder-decoder transformer models. We propose four editing tasks for NMT and show that the proposed editing algorithm achieves high efficacy.
Score: 12.478605921259403
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large sequence to sequence models for tasks such as Neural Machine Translation (NMT) are usually trained over hundreds of millions of samples. However, training is just the origin of a model's life-cycle. Real-world deployments of models require further behavioral adaptations as new requirements emerge or shortcomings become known. Typically, in the space of model behaviors, behavior deletion requests are addressed through model retrainings whereas model finetuning is done to address behavior addition requests, both procedures being instances of data-based model intervention. In this work, we present a preliminary study investigating rank-one editing as a direct intervention method for behavior deletion requests in encoder-decoder transformer models. We propose four editing tasks for NMT and show that the proposed editing algorithm achieves high efficacy, while requiring only a single instance of positive example to fix an erroneous (negative) model behavior.

Related papers

AdaMerging: Adaptive Model Merging for Multi-Task Learning [68.75885518081357]
This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging) It aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11% improvement in performance.
arXiv Detail & Related papers (2023-10-04T04:26:33Z)
Earning Extra Performance from Restrictive Feedbacks [41.05874087063763]
We set up a challenge named emphEarning eXtra PerformancE from restriCTive feEDdbacks (EXPECTED) to describe this form of model tuning problems. The goal of the model provider is to eventually deliver a satisfactory model to the local user(s) by utilizing the feedbacks. We propose to characterize the geometry of the model performance with regard to model parameters through exploring the parameters' distribution.
arXiv Detail & Related papers (2023-04-28T13:16:54Z)
$\Delta$-Patching: A Framework for Rapid Adaptation of Pre-trained Convolutional Networks without Base Performance Loss [71.46601663956521]
Models pre-trained on large-scale datasets are often fine-tuned to support newer tasks and datasets that arrive over time. We propose $Delta$-Patching for fine-tuning neural network models in an efficient manner, without the need to store model copies. Our experiments show that $Delta$-Networks outperform earlier model patching work while only requiring a fraction of parameters to be trained.
arXiv Detail & Related papers (2023-03-26T16:39:44Z)
Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA) Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space. We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z)
Debugging using Orthogonal Gradient Descent [7.766921168069532]
Given a trained model that is partially faulty, can we correct its behaviour without having to train the model from scratch? In other words, can we " neural networks similar to how we address bugs in our mathematical models and standard computer code?
arXiv Detail & Related papers (2022-06-17T00:03:54Z)
Learning to Model Editing Processes [98.11448946134894]
We propose modeling editing processes, modeling the whole process of iteratively generating sequences. We form a conceptual framework to describe the likelihood of multi-step edits, and describe neural models that can learn a generative model of sequences based on these multistep edits.
arXiv Detail & Related papers (2022-05-24T21:32:52Z)
Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding [13.65914588243695]
We propose an approach to bridge pre-trained models and code-related tasks. We exploit semantic-preserving transformation to enrich downstream data diversity. We introduce curriculum learning to organize the transformed data in an easy-to-hard manner to fine-tune existing pre-trained models.
arXiv Detail & Related papers (2021-12-04T07:21:28Z)
Fast Model Editing at Scale [77.69220974621425]
We propose Model Editor Networks with Gradient Decomposition (MEND) MEND is a collection of small auxiliary editing networks that use a single desired input-output pair to make fast, local edits to a pre-trained model. MEND can be trained on a single GPU in less than a day even for 10 billion+ parameter models.
arXiv Detail & Related papers (2021-10-21T17:41:56Z)
Improving Non-autoregressive Generation with Mixup Training [51.61038444990301]
We present a non-autoregressive generation model based on pre-trained transformer models. We propose a simple and effective iterative training method called MIx Source and pseudo Target. Our experiments on three generation benchmarks including question generation, summarization and paraphrase generation, show that the proposed framework achieves the new state-of-the-art results.
arXiv Detail & Related papers (2021-10-21T13:04:21Z)
Factual Error Correction for Abstractive Summarization Models [41.77317902748772]
We propose a post-editing corrector module to correct factual errors in generated summaries. We show that our model is able to correct factual errors in summaries generated by other neural summarization models. We also find that transferring from artificial error correction to downstream settings is still very challenging.
arXiv Detail & Related papers (2020-10-17T04:24:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.