Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for
Improved Generalization
- URL: http://arxiv.org/abs/2006.16205v4
- Date: Tue, 24 Oct 2023 23:44:37 GMT
- Title: Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for
Improved Generalization
- Authors: Sang Michael Xie, Tengyu Ma, Percy Liang
- Abstract summary: We focus on prediction problems with structured outputs subject to output validity constraints.
We propose composed fine-tuning, which fine-tunes a predictor composed with the pre-trained denoiser.
For two-layer ReLU networks, we prove that composed fine-tuning significantly reduces the complexity of the predictor.
- Score: 93.95299500688286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We focus on prediction problems with structured outputs that are subject to
output validity constraints, e.g. pseudocode-to-code translation where the code
must compile. While labeled input-output pairs are expensive to obtain,
"unlabeled" outputs, i.e. outputs without corresponding inputs, are freely
available (e.g. code on GitHub) and provide information about output validity.
We can capture the output structure by pre-training a denoiser to denoise
corrupted versions of unlabeled outputs. We first show that standard
fine-tuning after pre-training destroys some of this structure. We then propose
composed fine-tuning, which fine-tunes a predictor composed with the
pre-trained denoiser, which is frozen to preserve output structure. For
two-layer ReLU networks, we prove that composed fine-tuning significantly
reduces the complexity of the predictor, thus improving generalization.
Empirically, we show that composed fine-tuning improves over standard
fine-tuning on two pseudocode-to-code translation datasets (3% and 6%
relative). The improvement from composed fine-tuning is magnified on
out-of-distribution (OOD) examples (4% and 25% relative).
Related papers
- Training Language Models on Synthetic Edit Sequences Improves Code Synthesis [33.13471417703669]
Large language models (LLMs) autoresourcedly synthesize programs in a single pass.
We develop a synthetic data generation algorithm called LintSeq to generate high-quality code edit data.
We show that edit sequence finetuned models produce more diverse programs than baselines synthesis.
arXiv Detail & Related papers (2024-10-03T17:57:22Z) - $\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding [64.00025564372095]
Large language models (LLMs) have shown remarkable capabilities in code generation.
The effects of hallucinations (e.g., output noise) make it challenging for LLMs to generate high-quality code in one pass.
We propose a simple and effective textbfuncertainty-aware textbfselective textbfcontrastive textbfdecoding.
arXiv Detail & Related papers (2024-09-09T02:07:41Z) - Bit-flipping Decoder Failure Rate Estimation for (v,w)-regular Codes [84.0257274213152]
We propose a new technique to provide accurate estimates of the DFR of a two-iterations (parallel) bit flipping decoder.
We validate our results, providing comparisons of the modeled and simulated weight of the syndrome, incorrectly-guessed error bit distribution at the end of the first iteration, and two-itcrypteration Decoding Failure Rates (DFR)
arXiv Detail & Related papers (2024-01-30T11:40:24Z) - Diffusion-Based Speech Enhancement with Joint Generative and Predictive
Decoders [38.78712921188612]
We propose a unified system that use jointly generative and predictive decoders across two levels.
Experiments conducted on the Voice-Bank dataset demonstrate that incorporating predictive information leads to faster decoding and higher PESQ scores.
arXiv Detail & Related papers (2023-05-18T06:10:49Z) - Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side.
By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample.
We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z) - Few-shot Mining of Naturally Occurring Inputs and Outputs [83.3871936721431]
We mine input output examples from large corpora using a supervised mining function trained using a small seed set of only 100 examples.
Unlike model-generated data augmentation, our method mines naturally occurring high-quality input output pairs to mimic the style of the seed set for multiple tasks.
On SQuAD-style reading comprehension, augmenting the seed set with the mined data results in an improvement of 13 F1 over a BART-large baseline fine-tuned only on the seed set.
arXiv Detail & Related papers (2022-05-09T05:40:52Z) - On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder.
The gates are regularized using the expected value of the sparsity-inducing L0penalty.
We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z) - Learning the Relation between Code Features and Code Transforms with
Structured Prediction [13.62633524166298]
We present the first approach for structurally predicting code transforms at the level of AST nodes using conditional random fields (CRFs)
Our approach first learns offline a probabilistic model that captures how certain code transforms are applied to certain AST nodes, and then uses the learned model to predict transforms for arbitrary new, unseen code snippets.
arXiv Detail & Related papers (2019-07-22T12:42:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.