Related papers: Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization

URL: http://arxiv.org/abs/2006.16205v4
Date: Tue, 24 Oct 2023 23:44:37 GMT
Title: Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization
Authors: Sang Michael Xie, Tengyu Ma, Percy Liang
Abstract summary: We focus on prediction problems with structured outputs subject to output validity constraints. We propose composed fine-tuning, which fine-tunes a predictor composed with the pre-trained denoiser. For two-layer ReLU networks, we prove that composed fine-tuning significantly reduces the complexity of the predictor.
Score: 93.95299500688286
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We focus on prediction problems with structured outputs that are subject to output validity constraints, e.g. pseudocode-to-code translation where the code must compile. While labeled input-output pairs are expensive to obtain, "unlabeled" outputs, i.e. outputs without corresponding inputs, are freely available (e.g. code on GitHub) and provide information about output validity. We can capture the output structure by pre-training a denoiser to denoise corrupted versions of unlabeled outputs. We first show that standard fine-tuning after pre-training destroys some of this structure. We then propose composed fine-tuning, which fine-tunes a predictor composed with the pre-trained denoiser, which is frozen to preserve output structure. For two-layer ReLU networks, we prove that composed fine-tuning significantly reduces the complexity of the predictor, thus improving generalization. Empirically, we show that composed fine-tuning improves over standard fine-tuning on two pseudocode-to-code translation datasets (3% and 6% relative). The improvement from composed fine-tuning is magnified on out-of-distribution (OOD) examples (4% and 25% relative).

Related papers

Turbo-Annihilation of Hook Errors in Stabilizer Measurement Circuits [2.6999000177990924]
We propose a scalable decoding framework for correcting correlated hook errors in stabilizer measurement circuits. Traditional circuit-level decoding attempts to estimate the precise location of faults by constructing an extended Tanner graph. Our approach instead focuses on estimating the effective data errors caused by hook faults, modeling them as memory channels.
arXiv Detail & Related papers (2025-04-29T22:09:11Z)
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction [47.17755403213469]
We propose CodeI/O, a novel approach that condenses diverse reasoning patterns embedded in contextually-grounded codes. By training models to predict inputs/outputs given code and test cases entirely in natural language, we expose them to universal reasoning primitives. Experimental results demonstrate CodeI/O leads to consistent improvements across symbolic, scientific, logic, math & numerical, and commonsense reasoning tasks.
arXiv Detail & Related papers (2025-02-11T07:26:50Z)
Enhanced Min-Sum Decoding of Quantum Codes Using Previous Iteration Dynamics [3.6048794343841766]
We propose a novel message-passing decoding approach that leverages the degeneracy of quantum low-density parity-check codes. Our focus is on two-block Calderbank-Shor-Steane (CSS) codes, which are composed of symmetric stabilizers.
arXiv Detail & Related papers (2025-01-09T07:28:26Z)
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis [33.13471417703669]
Large language models (LLMs) autoresourcedly synthesize programs in a single pass. We develop a synthetic data generation algorithm called LintSeq to generate high-quality code edit data. We show that edit sequence finetuned models produce more diverse programs than baselines synthesis.
arXiv Detail & Related papers (2024-10-03T17:57:22Z)
$\mathbb{USCD}$: Improving Code Generation of LLMs by Uncertainty-Aware Selective Contrastive Decoding [64.00025564372095]
Large language models (LLMs) have shown remarkable capabilities in code generation. The effects of hallucinations (e.g., output noise) make it challenging for LLMs to generate high-quality code in one pass. We propose a simple and effective textbfuncertainty-aware textbfselective textbfcontrastive textbfdecoding.
arXiv Detail & Related papers (2024-09-09T02:07:41Z)
Bit-flipping Decoder Failure Rate Estimation for (v,w)-regular Codes [84.0257274213152]
We propose a new technique to provide accurate estimates of the DFR of a two-iterations (parallel) bit flipping decoder. We validate our results, providing comparisons of the modeled and simulated weight of the syndrome, incorrectly-guessed error bit distribution at the end of the first iteration, and two-itcrypteration Decoding Failure Rates (DFR)
arXiv Detail & Related papers (2024-01-30T11:40:24Z)
Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders [38.78712921188612]
We propose a unified system that use jointly generative and predictive decoders across two levels. Experiments conducted on the Voice-Bank dataset demonstrate that incorporating predictive information leads to faster decoding and higher PESQ scores.
arXiv Detail & Related papers (2023-05-18T06:10:49Z)
Decoder Tuning: Efficient Language Understanding as Decoding [84.68266271483022]
We present Decoder Tuning (DecT), which in contrast optimize task-specific decoder networks on the output side. By gradient-based optimization, DecT can be trained within several seconds and requires only one P query per sample. We conduct extensive natural language understanding experiments and show that DecT significantly outperforms state-of-the-art algorithms with a $200times$ speed-up.
arXiv Detail & Related papers (2022-12-16T11:15:39Z)
Few-shot Mining of Naturally Occurring Inputs and Outputs [83.3871936721431]
We mine input output examples from large corpora using a supervised mining function trained using a small seed set of only 100 examples. Unlike model-generated data augmentation, our method mines naturally occurring high-quality input output pairs to mimic the style of the seed set for multiple tasks. On SQuAD-style reading comprehension, augmenting the seed set with the mined data results in an improvement of 13 F1 over a BART-large baseline fine-tuned only on the seed set.
arXiv Detail & Related papers (2022-05-09T05:40:52Z)
On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder. The gates are regularized using the expected value of the sparsity-inducing L0penalty. We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z)
Learning the Relation between Code Features and Code Transforms with Structured Prediction [13.62633524166298]
We present the first approach for structurally predicting code transforms at the level of AST nodes using conditional random fields (CRFs) Our approach first learns offline a probabilistic model that captures how certain code transforms are applied to certain AST nodes, and then uses the learned model to predict transforms for arbitrary new, unseen code snippets.
arXiv Detail & Related papers (2019-07-22T12:42:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.