Related papers: Few-shot Mining of Naturally Occurring Inputs and Outputs

Few-shot Mining of Naturally Occurring Inputs and Outputs

URL: http://arxiv.org/abs/2205.04050v1
Date: Mon, 9 May 2022 05:40:52 GMT
Title: Few-shot Mining of Naturally Occurring Inputs and Outputs
Authors: Mandar Joshi and Terra Blevins and Mike Lewis and Daniel S. Weld and Luke Zettlemoyer
Abstract summary: We mine input output examples from large corpora using a supervised mining function trained using a small seed set of only 100 examples. Unlike model-generated data augmentation, our method mines naturally occurring high-quality input output pairs to mimic the style of the seed set for multiple tasks. On SQuAD-style reading comprehension, augmenting the seed set with the mined data results in an improvement of 13 F1 over a BART-large baseline fine-tuned only on the seed set.
Score: 83.3871936721431
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Creating labeled natural language training data is expensive and requires significant human effort. We mine input output examples from large corpora using a supervised mining function trained using a small seed set of only 100 examples. The mining consists of two stages -- (1) a biencoder-based recall-oriented dense search which pairs inputs with potential outputs, and (2) a crossencoder-based filter which re-ranks the output of the biencoder stage for better precision. Unlike model-generated data augmentation, our method mines naturally occurring high-quality input output pairs to mimic the style of the seed set for multiple tasks. On SQuAD-style reading comprehension, augmenting the seed set with the mined data results in an improvement of 13 F1 over a BART-large baseline fine-tuned only on the seed set. Likewise, we see improvements of 1.46 ROUGE-L on Xsum abstractive summarization.

Related papers

Exact Learning of Permutations for Nonzero Binary Inputs with Logarithmic Training Size and Quadratic Ensemble Complexity [5.3800094588915375]
This paper focuses on two-layer fully connected feed-forward neural networks and the task of learning permutations on nonzero binary inputs. We show that in the infinite width Neural Tangent Kernel (NTK) regime, an ensemble of such networks independently trained with gradient descent on only the $k$ standard basis vectors out of $2k - 1$ possible inputs successfully learns any fixed permutation of length $k$ with arbitrarily high probability.
arXiv Detail & Related papers (2025-02-24T00:50:02Z)
On Training a Neural Network to Explain Binaries [43.27448128029069]
In this work, we investigate the possibility of training a deep neural network on the task of binary code understanding. We build our own dataset derived from a capture of Stack Overflow containing 1.1M entries.
arXiv Detail & Related papers (2024-04-30T15:34:51Z)
Gradient-based Wang-Landau Algorithm: A Novel Sampler for Output Distribution of Neural Networks over the Input Space [20.60516313062773]
In this paper, we propose a novel Gradient-based Wang-Landau (GWL) sampler. We first draw the connection between the output distribution of a NN and the density of states (DOS) of a physical system. Then, we renovate the classic sampler for the DOS problem, the Wang-Landau algorithm, by replacing its random proposals with gradient-based Monte Carlo proposals.
arXiv Detail & Related papers (2023-02-19T05:42:30Z)
Graph Sampling Based Deep Metric Learning for Generalizable Person Re-Identification [114.56752624945142]
We argue that the most popular random sampling method, the well-known PK sampler, is not informative and efficient for deep metric learning. We propose an efficient mini batch sampling method called Graph Sampling (GS) for large-scale metric learning.
arXiv Detail & Related papers (2021-04-04T06:44:15Z)
Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one. Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP. We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z)
Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization [93.95299500688286]
We focus on prediction problems with structured outputs subject to output validity constraints. We propose composed fine-tuning, which fine-tunes a predictor composed with the pre-trained denoiser. For two-layer ReLU networks, we prove that composed fine-tuning significantly reduces the complexity of the predictor.
arXiv Detail & Related papers (2020-06-29T17:14:35Z)
Revisiting Regex Generation for Modeling Industrial Applications by Incorporating Byte Pair Encoder [14.42244606935982]
This work focuses on automatically generating regular expressions and proposes a novel genetic algorithm to deal with this problem. We first utilize byte pair encoder (BPE) to extract some frequent items, which are then used to construct regular expressions. By doing exponential decay, the training speed is approximately 100 times faster than the methods without using exponential decay.
arXiv Detail & Related papers (2020-05-06T02:09:10Z)
On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder. The gates are regularized using the expected value of the sparsity-inducing L0penalty. We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z)
Imputer: Sequence Modelling via Imputation and Dynamic Programming [101.5705527605346]
Imputer is an iterative generative model, requiring only a constant number of generation steps independent of the number of input or output tokens. We present a tractable dynamic programming training algorithm, which yields a lower bound on the log marginal likelihood.
arXiv Detail & Related papers (2020-02-20T18:21:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.