Tag-Evol: Achieving Efficient Instruction Evolving via Tag Injection
- URL: http://arxiv.org/abs/2505.24165v1
- Date: Fri, 30 May 2025 03:14:17 GMT
- Title: Tag-Evol: Achieving Efficient Instruction Evolving via Tag Injection
- Authors: Yixuan Wang, Shiqi Zhou, Chuanzhe Guo, Qingfu Zhu,
- Abstract summary: We propose the Tag-Evol framework, a more diverse and efficient instruction evolving method.<n>Specifically, Tag-Evol uses diverse and specific knowledge tags as strategies to achieve controlled evolution.<n> Experiments with multiple backbones in diverse domain benchmarks show that the proposed method generates significantly better evolved data than other methods.
- Score: 10.121053770426757
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Evol-Instruct has made significant improvements as a data synthesis method in several areas. Existing methods typically rely on a fixed set of strategies to evolve, which require manual design and are monolithic in form. In addition, iterative evolution also makes the acquisition of hard samples expensive. In view of this, we propose the Tag-Evol framework, a more diverse and efficient instruction evolving method. Specifically, Tag-Evol uses diverse and specific knowledge tags as strategies to achieve controlled evolution by injecting different combinations of tags into the original instructions. Experiments with multiple backbones in diverse domain benchmarks show that the proposed method generates significantly better evolved data than other methods. Furthermore, we conduct a thorough analysis of the evolved data, demonstrating that Tag-Evol is not only efficient but also generates more diverse and challenging data.
Related papers
- HiDe-LLaVA: Hierarchical Decoupling for Continual Instruction Tuning of Multimodal Large Language Model [37.85614317331844]
In this paper, we propose a task-specific expansion and task-general fusion framework.<n>We analyze the information leakage present in the existing benchmark and propose a new and more challenging benchmark to rationally evaluate the performance of different methods.
arXiv Detail & Related papers (2025-03-17T08:56:03Z) - Automatic Instruction Evolving for Large Language Models [93.52437926313621]
Auto Evol-Instruct is an end-to-end framework that evolves instruction datasets using large language models without any human effort.
Our experiments demonstrate that the best method optimized by Auto Evol-Instruct outperforms human-designed methods on various benchmarks.
arXiv Detail & Related papers (2024-06-02T15:09:00Z) - Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning [47.02160072880698]
We introduce a self-evolving mechanism that allows the model itself to actively sample subsets that are equally or even more effective.
The key to our data sampling technique lies in the enhancement of diversity in the chosen subsets.
Extensive experiments across three datasets and benchmarks demonstrate the effectiveness of DiverseEvol.
arXiv Detail & Related papers (2023-11-14T14:10:40Z) - Distributionally Robust Cross Subject EEG Decoding [15.211091130230589]
We propose a principled approach to perform dynamic evolution on the data for improvement of decoding robustness.
We derived a general data evolution framework based on Wasserstein gradient flow (WGF) and provides two different forms of evolution within the framework.
The proposed approach can be readily integrated with other data augmentation approaches for further improvements.
arXiv Detail & Related papers (2023-08-19T11:31:33Z) - Few-Shot Data-to-Text Generation via Unified Representation and
Multi-Source Learning [114.54944761345594]
We present a novel approach for structured data-to-text generation that addresses the limitations of existing methods.
Our proposed method aims to improve performance in multi-task training, zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2023-08-10T03:09:12Z) - Domain Generalization for Mammographic Image Analysis with Contrastive
Learning [62.25104935889111]
The training of an efficacious deep learning model requires large data with diverse styles and qualities.
A novel contrastive learning is developed to equip the deep learning models with better style generalization capability.
The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets.
arXiv Detail & Related papers (2023-04-20T11:40:21Z) - SDA: Improving Text Generation with Self Data Augmentation [88.24594090105899]
We propose to improve the standard maximum likelihood estimation (MLE) paradigm by incorporating a self-imitation-learning phase for automatic data augmentation.
Unlike most existing sentence-level augmentation strategies, our method is more general and could be easily adapted to any MLE-based training procedure.
arXiv Detail & Related papers (2021-01-02T01:15:57Z) - Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling.
It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.