Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for
Improved Generalization
- URL: http://arxiv.org/abs/2006.16205v4
- Date: Tue, 24 Oct 2023 23:44:37 GMT
- Title: Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for
Improved Generalization
- Authors: Sang Michael Xie, Tengyu Ma, Percy Liang
- Abstract summary: We focus on prediction problems with structured outputs subject to output validity constraints.
We propose composed fine-tuning, which fine-tunes a predictor composed with the pre-trained denoiser.
For two-layer ReLU networks, we prove that composed fine-tuning significantly reduces the complexity of the predictor.
- Score: 93.95299500688286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We focus on prediction problems with structured outputs that are subject to
output validity constraints, e.g. pseudocode-to-code translation where the code
must compile. While labeled input-output pairs are expensive to obtain,
"unlabeled" outputs, i.e. outputs without corresponding inputs, are freely
available (e.g. code on GitHub) and provide information about output validity.
We can capture the output structure by pre-training a denoiser to denoise
corrupted versions of unlabeled outputs. We first show that standard
fine-tuning after pre-training destroys some of this structure. We then propose
composed fine-tuning, which fine-tunes a predictor composed with the
pre-trained denoiser, which is frozen to preserve output structure. For
two-layer ReLU networks, we prove that composed fine-tuning significantly
reduces the complexity of the predictor, thus improving generalization.
Empirically, we show that composed fine-tuning improves over standard
fine-tuning on two pseudocode-to-code translation datasets (3% and 6%
relative). The improvement from composed fine-tuning is magnified on
out-of-distribution (OOD) examples (4% and 25% relative).
Related papers
- CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction [47.17755403213469]
We propose CodeI/O, a novel approach that condenses diverse reasoning patterns embedded in contextually-grounded codes.
By training models to predict inputs/outputs given code and test cases entirely in natural language, we expose them to universal reasoning primitives.
Experimental results demonstrate CodeI/O leads to consistent improvements across symbolic, scientific, logic, math & numerical, and commonsense reasoning tasks.
arXiv Detail & Related papers (2025-02-11T07:26:50Z) - Enhanced Min-Sum Decoding of Quantum Codes Using Previous Iteration Dynamics [3.6048794343841766]
We propose a novel message-passing decoding approach that leverages the degeneracy of quantum low-density parity-check codes.
Our focus is on two-block Calderbank-Shor-Steane (CSS) codes, which are composed of symmetric stabilizers.
arXiv Detail & Related papers (2025-01-09T07:28:26Z) - $\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding [64.00025564372095]
Large language models (LLMs) have shown remarkable capabilities in code generation.
The effects of hallucinations (e.g., output noise) make it challenging for LLMs to generate high-quality code in one pass.
We propose a simple and effective textbfuncertainty-aware textbfselective textbfcontrastive textbfdecoding.
arXiv Detail & Related papers (2024-09-09T02:07:41Z) - Estimating the Decoding Failure Rate of Binary Regular Codes Using Iterative Decoding [84.0257274213152]
We propose a new technique to provide accurate estimates of the DFR of a two-iterations (parallel) bit flipping decoder.
We validate our results, providing comparisons of the modeled and simulated weight of the syndrome, incorrectly-guessed error bit distribution at the end of the first iteration, and two-itcrypteration Decoding Failure Rates (DFR)
arXiv Detail & Related papers (2024-01-30T11:40:24Z) - Diffusion-Based Speech Enhancement with Joint Generative and Predictive
Decoders [38.78712921188612]
We propose a unified system that use jointly generative and predictive decoders across two levels.
Experiments conducted on the Voice-Bank dataset demonstrate that incorporating predictive information leads to faster decoding and higher PESQ scores.
arXiv Detail & Related papers (2023-05-18T06:10:49Z) - Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side.
By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample.
We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z) - Few-shot Mining of Naturally Occurring Inputs and Outputs [83.3871936721431]
We mine input output examples from large corpora using a supervised mining function trained using a small seed set of only 100 examples.
Unlike model-generated data augmentation, our method mines naturally occurring high-quality input output pairs to mimic the style of the seed set for multiple tasks.
On SQuAD-style reading comprehension, augmenting the seed set with the mined data results in an improvement of 13 F1 over a BART-large baseline fine-tuned only on the seed set.
arXiv Detail & Related papers (2022-05-09T05:40:52Z) - On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder.
The gates are regularized using the expected value of the sparsity-inducing L0penalty.
We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.