Symbol tuning improves in-context learning in language models
- URL: http://arxiv.org/abs/2305.08298v2
- Date: Sat, 30 Dec 2023 21:23:17 GMT
- Title: Symbol tuning improves in-context learning in language models
- Authors: Jerry Wei and Le Hou and Andrew Lampinen and Xiangning Chen and Da
Huang and Yi Tay and Xinyun Chen and Yifeng Lu and Denny Zhou and Tengyu Ma
and Quoc V. Le
- Abstract summary: We present symbol tuning - finetuning language models on in-context input-label pairs.
Symbol tuning leverages the intuition that when a model cannot use instructions or natural language labels to figure out a task, it must instead do so by learning the input-label mappings.
We show that symbol tuning boosts performance on unseen in-context learning tasks and is much more robust to underspecified prompts.
- Score: 144.58397538701803
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present symbol tuning - finetuning language models on in-context
input-label pairs where natural language labels (e.g., "positive/negative
sentiment") are replaced with arbitrary symbols (e.g., "foo/bar"). Symbol
tuning leverages the intuition that when a model cannot use instructions or
natural language labels to figure out a task, it must instead do so by learning
the input-label mappings.
We experiment with symbol tuning across Flan-PaLM models up to 540B
parameters and observe benefits across various settings. First, symbol tuning
boosts performance on unseen in-context learning tasks and is much more robust
to underspecified prompts, such as those without instructions or without
natural language labels. Second, symbol-tuned models are much stronger at
algorithmic reasoning tasks, with up to 18.2% better performance on the List
Functions benchmark and up to 15.3% better performance on the Simple Turing
Concepts benchmark. Finally, symbol-tuned models show large improvements in
following flipped-labels presented in-context, meaning that they are more
capable of using in-context information to override prior semantic knowledge.
Related papers
- Semantic Graph Representation Learning for Handwritten Mathematical
Expression Recognition [57.60390958736775]
We propose a simple but efficient method to enhance semantic interaction learning (SIL)
We first construct a semantic graph based on the statistical symbol co-occurrence probabilities.
Then we design a semantic aware module (SAM), which projects the visual and classification feature into semantic space.
Our method achieves better recognition performance than prior arts on both CROHME and HME100K datasets.
arXiv Detail & Related papers (2023-08-21T06:23:41Z) - Larger language models do in-context learning differently [93.90674531127559]
In-context learning (ICL) in language models is affected by semantic priors versus input-label mappings.
We investigate two setups-ICL with flipped labels and ICL with semantically-unrelated labels.
arXiv Detail & Related papers (2023-03-07T12:24:17Z) - Why Can GPT Learn In-Context? Language Models Implicitly Perform
Gradient Descent as Meta-Optimizers [93.9369467909176]
We explain language models as meta-optimizers and understand in-context learning as implicit finetuning.
We show that in-context learning behaves similarly to explicit finetuning from multiple perspectives.
The improved performance over vanilla attention further supports our understanding from another perspective.
arXiv Detail & Related papers (2022-12-20T18:58:48Z) - Bidirectional Representations for Low Resource Spoken Language
Understanding [39.208462511430554]
We propose a representation model to encode speech in bidirectional rich encodings.
The approach uses a masked language modelling objective to learn the representations.
We show that the performance of the resulting encodings is better than comparable models on multiple datasets.
arXiv Detail & Related papers (2022-11-24T17:05:16Z) - Improving Model Training via Self-learned Label Representations [5.969349640156469]
We show that more sophisticated label representations are better for classification than the usual one-hot encoding.
We propose Learning with Adaptive Labels (LwAL) algorithm, which simultaneously learns the label representation while training for the classification task.
Our algorithm introduces negligible additional parameters and has a minimal computational overhead.
arXiv Detail & Related papers (2022-09-09T21:10:43Z) - Prefix-Tuning: Optimizing Continuous Prompts for Generation [85.6357778621526]
Fine-tuning is the de facto way to leverage large pretrained language models to perform downstream tasks.
We propose prefix-tuning, a lightweight alternative to fine-tuning for natural language generation tasks.
We find that by learning only 0.1% of the parameters, prefix-tuning obtains comparable performance in the full data setting.
arXiv Detail & Related papers (2021-01-01T08:00:36Z) - Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models.
We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z) - Learning Soft Labels via Meta Learning [3.4852307714135375]
One-hot labels do not represent soft decision boundaries among concepts, and hence, models trained on them are prone to overfitting.
We propose a framework, where we treat the labels as learnable parameters, and optimize them along with model parameters.
We show that learned labels capture semantic relationship between classes, and thereby improve teacher models for the downstream task of distillation.
arXiv Detail & Related papers (2020-09-20T18:42:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.