Hyperdecoders: Instance-specific decoders for multi-task NLP
- URL: http://arxiv.org/abs/2203.08304v1
- Date: Tue, 15 Mar 2022 22:39:53 GMT
- Title: Hyperdecoders: Instance-specific decoders for multi-task NLP
- Authors: Hamish Ivison and Matthew E. Peters
- Abstract summary: We investigate input-conditioned hypernetworks for multi-tasking in NLP.
We generate parameter-efficient adaptations for a decoder using a hypernetwork conditioned on the output of an encoder.
- Score: 9.244884318445413
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate input-conditioned hypernetworks for multi-tasking in NLP,
generating parameter-efficient adaptations for a decoder using a hypernetwork
conditioned on the output of an encoder. This approach produces a unique
decoder for every input instance, allowing the network a larger degree of
flexibility than prior work that specializes the decoder for each task. We
apply our method to sequence classification tasks, extractive QA, and
summarisation and find that it often outperforms fully finetuning the
underlying model and surpasses previous parameter efficient fine-tuning
methods. Gains are particularly large when evaluated out-of-domain on the MRQA
benchmark. In addition, as the pretrained model is frozen, our method
eliminates negative interference among unrelated tasks, a common failure mode
in fully fine-tuned approaches. An analysis of the embeddings produced by our
model suggests that a large benefit of our approach is allowing the encoder
more effective control over the decoder, allowing mapping from hidden
representations to a final text-based label without interference from other
tasks' output formats or labels.
Related papers
- FADE: A Task-Agnostic Upsampling Operator for Encoder-Decoder Architectures [18.17019371324024]
FADE is a novel, plug-and-play, lightweight, and task-agnostic upsampling operator.
We show that FADE is task-agnostic with consistent performance improvement on a number of dense prediction tasks.
For the first time, we demonstrate robust feature upsampling on both region- and detail-sensitive tasks successfully.
arXiv Detail & Related papers (2024-07-18T13:32:36Z) - Efficient Encoder-Decoder Transformer Decoding for Decomposable Tasks [53.550782959908524]
We introduce a new configuration for encoder-decoder models that improves efficiency on structured output and decomposable tasks.
Our method, prompt-in-decoder (PiD), encodes the input once and decodes the output in parallel, boosting both training and inference efficiency.
arXiv Detail & Related papers (2024-03-19T19:27:23Z) - Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction [57.16121098944589]
RDA is a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders.
It is accomplished via a sample-wise prototype, which consolidates the target encoder's representations for a given sample's various perspectives.
For more potent efficacy, we develop a multi-relational extraction loss that trains the surrogate encoder to Discriminate mismatched embedding-prototype pairs.
arXiv Detail & Related papers (2023-12-01T15:03:29Z) - Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix
Factorization [60.91600465922932]
We present an approach that avoids the use of a dual-encoder for retrieval, relying solely on the cross-encoder.
Our approach provides test-time recall-vs-computational cost trade-offs superior to the current widely-used methods.
arXiv Detail & Related papers (2022-10-23T00:32:04Z) - String-based Molecule Generation via Multi-decoder VAE [56.465033997245776]
We investigate the problem of string-based molecular generation via variational autoencoders (VAEs)
We propose a simple, yet effective idea to improve the performance of VAE for the task.
In our experiments, the proposed VAE model particularly performs well for generating a sample from out-of-domain distribution.
arXiv Detail & Related papers (2022-08-23T03:56:30Z) - Rate Distortion Characteristic Modeling for Neural Image Compression [59.25700168404325]
End-to-end optimization capability offers neural image compression (NIC) superior lossy compression performance.
distinct models are required to be trained to reach different points in the rate-distortion (R-D) space.
We make efforts to formulate the essential mathematical functions to describe the R-D behavior of NIC using deep network and statistical modeling.
arXiv Detail & Related papers (2021-06-24T12:23:05Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z) - End-to-End Synthetic Data Generation for Domain Adaptation of Question
Answering Systems [34.927828428293864]
Our model comprises a single transformer-based encoder-decoder network that is trained end-to-end to generate both answers and questions.
In a nutshell, we feed a passage to the encoder and ask the decoder to generate a question and an answer token-by-token.
arXiv Detail & Related papers (2020-10-12T21:10:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.