Few-shot Mining of Naturally Occurring Inputs and Outputs
- URL: http://arxiv.org/abs/2205.04050v1
- Date: Mon, 9 May 2022 05:40:52 GMT
- Title: Few-shot Mining of Naturally Occurring Inputs and Outputs
- Authors: Mandar Joshi and Terra Blevins and Mike Lewis and Daniel S. Weld and
Luke Zettlemoyer
- Abstract summary: We mine input output examples from large corpora using a supervised mining function trained using a small seed set of only 100 examples.
Unlike model-generated data augmentation, our method mines naturally occurring high-quality input output pairs to mimic the style of the seed set for multiple tasks.
On SQuAD-style reading comprehension, augmenting the seed set with the mined data results in an improvement of 13 F1 over a BART-large baseline fine-tuned only on the seed set.
- Score: 83.3871936721431
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Creating labeled natural language training data is expensive and requires
significant human effort. We mine input output examples from large corpora
using a supervised mining function trained using a small seed set of only 100
examples. The mining consists of two stages -- (1) a biencoder-based
recall-oriented dense search which pairs inputs with potential outputs, and (2)
a crossencoder-based filter which re-ranks the output of the biencoder stage
for better precision. Unlike model-generated data augmentation, our method
mines naturally occurring high-quality input output pairs to mimic the style of
the seed set for multiple tasks. On SQuAD-style reading comprehension,
augmenting the seed set with the mined data results in an improvement of 13 F1
over a BART-large baseline fine-tuned only on the seed set. Likewise, we see
improvements of 1.46 ROUGE-L on Xsum abstractive summarization.
Related papers
- On Training a Neural Network to Explain Binaries [43.27448128029069]
In this work, we investigate the possibility of training a deep neural network on the task of binary code understanding.
We build our own dataset derived from a capture of Stack Overflow containing 1.1M entries.
arXiv Detail & Related papers (2024-04-30T15:34:51Z) - Gradient-based Wang-Landau Algorithm: A Novel Sampler for Output
Distribution of Neural Networks over the Input Space [20.60516313062773]
In this paper, we propose a novel Gradient-based Wang-Landau (GWL) sampler.
We first draw the connection between the output distribution of a NN and the density of states (DOS) of a physical system.
Then, we renovate the classic sampler for the DOS problem, the Wang-Landau algorithm, by replacing its random proposals with gradient-based Monte Carlo proposals.
arXiv Detail & Related papers (2023-02-19T05:42:30Z) - Graph Sampling Based Deep Metric Learning for Generalizable Person
Re-Identification [114.56752624945142]
We argue that the most popular random sampling method, the well-known PK sampler, is not informative and efficient for deep metric learning.
We propose an efficient mini batch sampling method called Graph Sampling (GS) for large-scale metric learning.
arXiv Detail & Related papers (2021-04-04T06:44:15Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for
Improved Generalization [93.95299500688286]
We focus on prediction problems with structured outputs subject to output validity constraints.
We propose composed fine-tuning, which fine-tunes a predictor composed with the pre-trained denoiser.
For two-layer ReLU networks, we prove that composed fine-tuning significantly reduces the complexity of the predictor.
arXiv Detail & Related papers (2020-06-29T17:14:35Z) - Revisiting Regex Generation for Modeling Industrial Applications by
Incorporating Byte Pair Encoder [14.42244606935982]
This work focuses on automatically generating regular expressions and proposes a novel genetic algorithm to deal with this problem.
We first utilize byte pair encoder (BPE) to extract some frequent items, which are then used to construct regular expressions.
By doing exponential decay, the training speed is approximately 100 times faster than the methods without using exponential decay.
arXiv Detail & Related papers (2020-05-06T02:09:10Z) - On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder.
The gates are regularized using the expected value of the sparsity-inducing L0penalty.
We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z) - Imputer: Sequence Modelling via Imputation and Dynamic Programming [101.5705527605346]
Imputer is an iterative generative model, requiring only a constant number of generation steps independent of the number of input or output tokens.
We present a tractable dynamic programming training algorithm, which yields a lower bound on the log marginal likelihood.
arXiv Detail & Related papers (2020-02-20T18:21:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.