Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages
- URL: http://arxiv.org/abs/2001.11453v3
- Date: Sun, 22 Nov 2020 19:06:18 GMT
- Title: Parameter Space Factorization for Zero-Shot Learning across Tasks and
Languages
- Authors: Edoardo M. Ponti, Ivan Vuli\'c, Ryan Cotterell, Marinela Parovic, Roi
Reichart and Anna Korhonen
- Abstract summary: We propose a Bayesian generative model for the space of neural parameters.
We infer the posteriors over such latent variables based on data from seen task-language combinations.
Our model yields comparable or better results than state-of-the-art, zero-shot cross-lingual transfer methods.
- Score: 112.65994041398481
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most combinations of NLP tasks and language varieties lack in-domain examples
for supervised training because of the paucity of annotated data. How can
neural models make sample-efficient generalizations from task-language
combinations with available data to low-resource ones? In this work, we propose
a Bayesian generative model for the space of neural parameters. We assume that
this space can be factorized into latent variables for each language and each
task. We infer the posteriors over such latent variables based on data from
seen task-language combinations through variational inference. This enables
zero-shot classification on unseen combinations at prediction time. For
instance, given training data for named entity recognition (NER) in Vietnamese
and for part-of-speech (POS) tagging in Wolof, our model can perform accurate
predictions for NER in Wolof. In particular, we experiment with a typologically
diverse sample of 33 languages from 4 continents and 11 families, and show that
our model yields comparable or better results than state-of-the-art, zero-shot
cross-lingual transfer methods. Moreover, we demonstrate that approximate
Bayesian model averaging results in smoother predictive distributions, whose
entropy inversely correlates with accuracy. Hence, the proposed framework also
offers robust estimates of prediction uncertainty. Our code is located at
github.com/cambridgeltl/parameter-factorization
Related papers
- Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Neural Spline Search for Quantile Probabilistic Modeling [35.914279831992964]
We propose a non-parametric and data-driven approach, Neural Spline Search (NSS), to represent the observed data distribution without parametric assumptions.
We demonstrate that NSS outperforms previous methods on synthetic, real-world regression and time-series forecasting tasks.
arXiv Detail & Related papers (2023-01-12T07:45:28Z) - Eeny, meeny, miny, moe. How to choose data for morphological inflection [8.914777617216862]
This paper explores four sampling strategies for the task of morphological inflection using a Transformer model.
We investigate the robustness of each strategy across 30 typologically diverse languages.
Our results show a clear benefit to selecting data based on model confidence and entropy.
arXiv Detail & Related papers (2022-10-26T04:33:18Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - Dependency-based Mixture Language Models [53.152011258252315]
We introduce the Dependency-based Mixture Language Models.
In detail, we first train neural language models with a novel dependency modeling objective.
We then formulate the next-token probability by mixing the previous dependency modeling probability distributions with self-attention.
arXiv Detail & Related papers (2022-03-19T06:28:30Z) - Uncertainty Estimation for Language Reward Models [5.33024001730262]
Language models can learn a range of capabilities from unsupervised training on text corpora.
It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons.
We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning.
arXiv Detail & Related papers (2022-03-14T20:13:21Z) - Efficient Nearest Neighbor Language Models [114.40866461741795]
Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore.
We show how to achieve up to a 6x speed-up in inference speed while retaining comparable performance.
arXiv Detail & Related papers (2021-09-09T12:32:28Z) - Fantastically Ordered Prompts and Where to Find Them: Overcoming
Few-Shot Prompt Order Sensitivity [16.893758238773263]
When primed with only a handful of training samples, very large pretrained language models such as GPT-3, have shown competitive results.
We demonstrate that the order in which the samples are provided can be the difference between near state-of-the-art and random guess performance.
We use the generative nature of the language models to construct an artificial development set and based on entropy statistics of the candidate permutations from this set we identify performant prompts.
arXiv Detail & Related papers (2021-04-18T09:29:16Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.