Warmup Generations: A Task-Agnostic Approach for Guiding Sequence-to-Sequence Learning with Unsupervised Initial State Generation
- URL: http://arxiv.org/abs/2502.12304v1
- Date: Mon, 17 Feb 2025 20:23:42 GMT
- Title: Warmup Generations: A Task-Agnostic Approach for Guiding Sequence-to-Sequence Learning with Unsupervised Initial State Generation
- Authors: Senyu Li, Zipeng Sun, Jiayi Wang, Xue Liu, Pontus Stenetorp, Siva Reddy, David Ifeoluwa Adelani,
- Abstract summary: Traditional supervised fine-tuning (SFT) strategies for sequence-to-sequence tasks often train models to directly generate the target output.
We introduce a task-agnostic framework that enables models to generate intermediate "upwarm" sequences.
We show that our approach outperforms traditional SFT methods, and offers a scalable and flexible solution for sequence-to-sequence tasks.
- Score: 34.55224347308013
- License:
- Abstract: Traditional supervised fine-tuning (SFT) strategies for sequence-to-sequence tasks often train models to directly generate the target output. Recent work has shown that guiding models with intermediate steps, such as keywords, outlines, or reasoning chains, can significantly improve performance, coherence, and interpretability. However, these methods often depend on predefined intermediate formats and annotated data, limiting their scalability and generalizability. In this work, we introduce a task-agnostic framework that enables models to generate intermediate "warmup" sequences. These warmup sequences, serving as an initial state for subsequent generation, are optimized to enhance the probability of generating the target sequence without relying on external supervision or human-designed structures. Drawing inspiration from reinforcement learning principles, our method iteratively refines these intermediate steps to maximize their contribution to the final output, similar to reward-driven optimization in reinforcement learning with human feedback. Experimental results across tasks such as translation, summarization, and multi-choice question answering for logical reasoning show that our approach outperforms traditional SFT methods, and offers a scalable and flexible solution for sequence-to-sequence tasks.
Related papers
- PGSO: Prompt-based Generative Sequence Optimization Network for Aspect-based Sentiment Analysis [9.617652261815671]
We introduce two sequence optimization strategies: the rule-based static optimization and the score-based dynamic optimization.
Based on the dynamic optimization structure, we propose a unified Prompt-based Generative Sequence Optimization network (named PGSO)
Experiments conducted on four ABSA tasks across multiple benchmarks indicate that PGSO outperforms state-of-the-art methods, with an average improvement of 3.52% in F1 score.
arXiv Detail & Related papers (2024-12-01T10:49:55Z) - Causality-Enhanced Behavior Sequence Modeling in LLMs for Personalized Recommendation [47.29682938439268]
We propose a novel Counterfactual Fine-Tuning (CFT) method to improve user preference modeling.
We employ counterfactual reasoning to identify the causal effects of behavior sequences on model output.
Experiments on real-world datasets demonstrate that CFT effectively improves behavior sequence modeling.
arXiv Detail & Related papers (2024-10-30T08:41:13Z) - Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation [34.042565099565934]
We propose a plan-based training and reasoning method that guides models to generate arranging steps through abstract plans.
Results show that compared to fine-tuning directly with CoT data, our approach achieves a better performance on alleviating arranging bottleneck.
arXiv Detail & Related papers (2024-10-22T08:38:50Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - Finding the DeepDream for Time Series: Activation Maximization for Univariate Time Series [10.388704631887496]
We introduce Sequence Dreaming, a technique that adapts Maxim Activationization to analyze sequential information.
We visualize the temporal dynamics and patterns most influential in model decision-making processes.
arXiv Detail & Related papers (2024-08-20T08:09:44Z) - An End-to-End Reinforcement Learning Based Approach for Micro-View Order-Dispatching in Ride-Hailing [8.892147201091726]
We propose an end-to-end reinforcement learning based order-dispatching approach in Didi.
We employ a two-layer Decision Process framework to model this problem, and present underlineDeep underlineDouble underlineScalable underlineNetwork (DSN2), an encoder-decoder structure network to generate order assignments.
By leveraging contextual dynamics, our approach can adapt to the behavioral patterns for better performance.
arXiv Detail & Related papers (2024-08-20T01:30:53Z) - Skeleton2vec: A Self-supervised Learning Framework with Contextualized
Target Representations for Skeleton Sequence [56.092059713922744]
We show that using high-level contextualized features as prediction targets can achieve superior performance.
Specifically, we propose Skeleton2vec, a simple and efficient self-supervised 3D action representation learning framework.
Our proposed Skeleton2vec outperforms previous methods and achieves state-of-the-art results.
arXiv Detail & Related papers (2024-01-01T12:08:35Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - Enhancing Few-shot NER with Prompt Ordering based Data Augmentation [59.69108119752584]
We propose a Prompt Ordering based Data Augmentation (PODA) method to improve the training of unified autoregressive generation frameworks.
Experimental results on three public NER datasets and further analyses demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-05-19T16:25:43Z) - Diffusion Action Segmentation [63.061058214427085]
We propose a novel framework via denoising diffusion models, which shares the same inherent spirit of such iterative refinement.
In this framework, action predictions are iteratively generated from random noise with input video features as conditions.
arXiv Detail & Related papers (2023-03-31T10:53:24Z) - CoopInit: Initializing Generative Adversarial Networks via Cooperative
Learning [50.90384817689249]
CoopInit is a cooperative learning-based strategy that can quickly learn a good starting point for GANs.
We demonstrate the effectiveness of the proposed approach on image generation and one-sided unpaired image-to-image translation tasks.
arXiv Detail & Related papers (2023-03-21T07:49:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.