Related papers: Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners

Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners

URL: http://arxiv.org/abs/2307.14856v2
Date: Tue, 27 Aug 2024 04:30:29 GMT
Title: Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners
Authors: Jihyeon Lee, Dain Kim, Doohae Jung, Boseop Kim, Kyoung-Woon On,
Abstract summary: We show that seq2seq models can be highly effective few-shot learners for a wide spectrum of applications. We propose two methods to more effectively elicit in-context learning ability in seq2seq models: objective-aligned prompting and a fusion-based approach.
Score: 8.43854206194162
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In-context learning, which offers substantial advantages over fine-tuning, is predominantly observed in decoder-only models, while encoder-decoder (i.e., seq2seq) models excel in methods that rely on weight updates. Recently, a few studies have demonstrated the feasibility of few-shot learning with seq2seq models; however, this has been limited to tasks that align well with the seq2seq architecture, such as summarization and translation. Inspired by these initial studies, we provide a first-ever extensive experiment comparing the in-context few-shot learning capabilities of decoder-only and encoder-decoder models on a broad range of tasks. Furthermore, we propose two methods to more effectively elicit in-context learning ability in seq2seq models: objective-aligned prompting and a fusion-based approach. Remarkably, our approach outperforms a decoder-only model that is six times larger and exhibits significant performance improvements compared to conventional seq2seq models across a variety of settings. We posit that, with the right configuration and prompt design, seq2seq models can be highly effective few-shot learners for a wide spectrum of applications.

Related papers

Decoding Partial Differential Equations: Cross-Modal Adaptation of Decoder-only Models to PDEs [27.331524018411926]
We compare encoder-only and decoder-only models on cross-modal adaptation for time-dependent simulation tasks.<n>We find that decoder-only models are far worse than encoder-only models, when existing approaches are applied unmodified.<n>We introduce two novel approaches, Parallel Flipping and Sequence Doubling, attempting to mimic bidirectionality in autoregressive models.
arXiv Detail & Related papers (2025-10-06T18:46:50Z)
A2R: An Asymmetric Two-Stage Reasoning Framework for Parallel Reasoning [57.727084580884075]
Asymmetric Two-Stage Reasoning framework designed to bridge gap between a model's potential and its actual performance.<n>A2R-Efficient is a "small-to-big" variant that combines a Qwen3-4B explorer with a Qwen3-8B synthesizer.<n>Results show A2R is not only a performance-boosting framework but also an efficient and practical solution for real-world applications.
arXiv Detail & Related papers (2025-09-26T08:27:03Z)
Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging [103.98582374569789]
Model merging aims to combine multiple expert models into a single model, thereby reducing storage and serving costs.<n>Previous studies have primarily focused on merging visual classification models or Large Language Models (LLMs) for code and math tasks.<n>We introduce the model merging benchmark for MLLMs, which includes multiple tasks such as VQA, Geometry, Chart, OCR, and Grounding, providing both LoRA and full fine-tuning models.
arXiv Detail & Related papers (2025-05-26T12:23:14Z)
RSQ: Learning from Important Tokens Leads to Better Quantized LLMs [65.5558181902098]
Layer-wise quantization is a key technique for efficiently compressing large models without expensive retraining. We propose RSQ (Rotate, Scale, then Quantize), which applies rotations to the model to mitigate outliers. We demonstrate that RSQ consistently outperforms baseline methods across multiple downstream tasks and three model families.
arXiv Detail & Related papers (2025-03-03T18:46:33Z)
EmbedLLM: Learning Compact Representations of Large Language Models [28.49433308281983]
We propose EmbedLLM, a framework designed to learn compact vector representations of Large Language Models. We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness. Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency.
arXiv Detail & Related papers (2024-10-03T05:43:24Z)
Self-Supervised Open-Ended Classification with Small Visual Language Models [60.23212389067007]
We present Self-Context Adaptation (SeCAt), a self-supervised approach that unlocks few-shot abilities for open-ended classification with small visual language models. By using models with approximately 1B parameters we outperform the few-shot abilities of much larger models, such as Frozen and FROMAGe.
arXiv Detail & Related papers (2023-09-30T21:41:21Z)
RAVEN: In-Context Learning with Retrieval-Augmented Encoder-Decoder Language Models [57.12888828853409]
RAVEN is a model that combines retrieval-augmented masked language modeling and prefix language modeling. Fusion-in-Context Learning enables the model to leverage more in-context examples without requiring additional training. Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning.
arXiv Detail & Related papers (2023-08-15T17:59:18Z)
Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models [16.49601740473416]
We explore recipes to improve training efficiency by initializing one model from the other. Using an encoder to warm-start seq2seq training, we show that we can match task performance of a from-scratch seq2seq model.
arXiv Detail & Related papers (2023-06-14T21:41:52Z)
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks [59.09343552273045]
We propose a decoder-only model for multimodal tasks, which is surprisingly effective in jointly learning of these disparate vision-language tasks. We demonstrate that joint learning of these diverse objectives is simple, effective, and maximizes the weight-sharing of the model across these tasks. Our model achieves the state of the art on image-text and text-image retrieval, video question answering and open-vocabulary detection tasks, outperforming much larger and more extensively trained foundational models.
arXiv Detail & Related papers (2023-03-29T16:42:30Z)
Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models. A collection of pretrained encoders perceive diverse modalities (such as vision, and language) We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z)
Tiny Neural Models for Seq2Seq [0.0]
We propose a projection based encoder-decoder model referred to as pQRNN-MAtt. The resulting quantized models are less than 3.5MB in size and are well suited for on-device latency critical applications. We show that on MTOP, a challenging multilingual semantic parsing dataset, the average model performance surpasses LSTM based seq2seq model that uses pre-trained embeddings despite being 85x smaller.
arXiv Detail & Related papers (2021-08-07T00:39:42Z)
When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model [87.25037167380522]
We propose a model that is accurate, robust, efficient, generalizable, and end-to-end trainable. In order to achieve a better accuracy, we propose two lightweight modules. DQInit dynamically initializes the queries of decoder from the inputs, enabling the model to achieve as good accuracy as the ones with multiple decoder layers. QAMem is designed to enhance the discriminative ability of queries on low-resolution feature maps by assigning separate memory values to each query rather than a shared one.
arXiv Detail & Related papers (2021-05-27T13:51:42Z)
Randomized Ensembled Double Q-Learning: Learning Fast Without a Model [8.04816643418952]
We introduce a simple model-free algorithm, Randomized Ensembled Double Q-Learning (REDQ) We show that REDQ's performance is just as good as, if not better than, a state-of-the-art model-based algorithm for the MuJoCo benchmark.
arXiv Detail & Related papers (2021-01-15T06:25:58Z)
Multiscale Deep Equilibrium Models [162.15362280927476]
We propose a new class of implicit networks, the multiscale deep equilibrium model (MDEQ) An MDEQ directly solves for and backpropagates through the equilibrium points of multiple feature resolutions simultaneously. We illustrate the effectiveness of this approach on two large-scale vision tasks: ImageNet classification and semantic segmentation on high-resolution images from the Cityscapes dataset.
arXiv Detail & Related papers (2020-06-15T18:07:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.