Related papers: Hyperdecoders: Instance-specific decoders for multi-task NLP

Hyperdecoders: Instance-specific decoders for multi-task NLP

URL: http://arxiv.org/abs/2203.08304v1
Date: Tue, 15 Mar 2022 22:39:53 GMT
Title: Hyperdecoders: Instance-specific decoders for multi-task NLP
Authors: Hamish Ivison and Matthew E. Peters
Abstract summary: We investigate input-conditioned hypernetworks for multi-tasking in NLP. We generate parameter-efficient adaptations for a decoder using a hypernetwork conditioned on the output of an encoder.
Score: 9.244884318445413
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate input-conditioned hypernetworks for multi-tasking in NLP, generating parameter-efficient adaptations for a decoder using a hypernetwork conditioned on the output of an encoder. This approach produces a unique decoder for every input instance, allowing the network a larger degree of flexibility than prior work that specializes the decoder for each task. We apply our method to sequence classification tasks, extractive QA, and summarisation and find that it often outperforms fully finetuning the underlying model and surpasses previous parameter efficient fine-tuning methods. Gains are particularly large when evaluated out-of-domain on the MRQA benchmark. In addition, as the pretrained model is frozen, our method eliminates negative interference among unrelated tasks, a common failure mode in fully fine-tuned approaches. An analysis of the embeddings produced by our model suggests that a large benefit of our approach is allowing the encoder more effective control over the decoder, allowing mapping from hidden representations to a final text-based label without interference from other tasks' output formats or labels.

Related papers

METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models [92.37117312251755]
We propose a progressive pruning framework, namely Multi-Encoder collaboraTivE tOken pRuning (METEOR)<n>For multi-vision encoding, we discard redundant tokens within each encoder via a rank guided collaborative token assignment strategy.<n>For multi-vision fusion, we combine the visual features from different encoders while reducing cross-encoder redundancy with cooperative pruning.
arXiv Detail & Related papers (2025-07-28T13:50:53Z)
FADE: A Task-Agnostic Upsampling Operator for Encoder-Decoder Architectures [18.17019371324024]
FADE is a novel, plug-and-play, lightweight, and task-agnostic upsampling operator. We show that FADE is task-agnostic with consistent performance improvement on a number of dense prediction tasks. For the first time, we demonstrate robust feature upsampling on both region- and detail-sensitive tasks successfully.
arXiv Detail & Related papers (2024-07-18T13:32:36Z)
Efficient Encoder-Decoder Transformer Decoding for Decomposable Tasks [53.550782959908524]
We introduce a new configuration for encoder-decoder models that improves efficiency on structured output and decomposable tasks. Our method, prompt-in-decoder (PiD), encodes the input once and decodes the output in parallel, boosting both training and inference efficiency.
arXiv Detail & Related papers (2024-03-19T19:27:23Z)
Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction [57.16121098944589]
RDA is a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders. It is accomplished via a sample-wise prototype, which consolidates the target encoder's representations for a given sample's various perspectives. For more potent efficacy, we develop a multi-relational extraction loss that trains the surrogate encoder to Discriminate mismatched embedding-prototype pairs.
arXiv Detail & Related papers (2023-12-01T15:03:29Z)
Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix Factorization [60.91600465922932]
We present an approach that avoids the use of a dual-encoder for retrieval, relying solely on the cross-encoder. Our approach provides test-time recall-vs-computational cost trade-offs superior to the current widely-used methods.
arXiv Detail & Related papers (2022-10-23T00:32:04Z)
String-based Molecule Generation via Multi-decoder VAE [56.465033997245776]
We investigate the problem of string-based molecular generation via variational autoencoders (VAEs) We propose a simple, yet effective idea to improve the performance of VAE for the task. In our experiments, the proposed VAE model particularly performs well for generating a sample from out-of-domain distribution.
arXiv Detail & Related papers (2022-08-23T03:56:30Z)
Rate Distortion Characteristic Modeling for Neural Image Compression [59.25700168404325]
End-to-end optimization capability offers neural image compression (NIC) superior lossy compression performance. distinct models are required to be trained to reach different points in the rate-distortion (R-D) space. We make efforts to formulate the essential mathematical functions to describe the R-D behavior of NIC using deep network and statistical modeling.
arXiv Detail & Related papers (2021-06-24T12:23:05Z)
Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency. We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z)
End-to-End Synthetic Data Generation for Domain Adaptation of Question Answering Systems [34.927828428293864]
Our model comprises a single transformer-based encoder-decoder network that is trained end-to-end to generate both answers and questions. In a nutshell, we feed a passage to the encoder and ask the decoder to generate a question and an answer token-by-token.
arXiv Detail & Related papers (2020-10-12T21:10:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.