Related papers: MuSC: Improving Complex Instruction Following with Multi-granularity Self-Contrastive Training

MuSC: Improving Complex Instruction Following with Multi-granularity Self-Contrastive Training

URL: http://arxiv.org/abs/2502.11541v2
Date: Sun, 23 Feb 2025 05:56:44 GMT
Title: MuSC: Improving Complex Instruction Following with Multi-granularity Self-Contrastive Training
Authors: Hui Huang, Jiaheng Liu, Yancheng He, Shilong Li, Bing Xu, Conghui Zhu, Muyun Yang, Tiejun Zhao,
Abstract summary: We propose a Multi-granularity Self-Contrastive Training (MuSC) framework to improve the complex instruction alignment without relying on a stronger model.<n>Our method is evaluated on open-sourced models, and experiment results show our method achieves significant improvement on both complex and general instruction-following benchmarks.
Score: 36.483136685734735
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Complex instruction-following with elaborate constraints is imperative for Large Language Models (LLMs). While existing methods have constructed data for complex instruction alignment, they all rely on a more advanced model, especially GPT-4, limiting their application. In this paper, we propose a Multi-granularity Self-Contrastive Training (MuSC) framework, to improve the complex instruction alignment without relying on a stronger model. Our method is conducted on both coarse and fine granularity. On coarse-granularity, we construct constraint-aware preference data based on instruction decomposition and recombination. On fine-granularity, we perform token-aware preference optimization with dynamic token-level supervision. Our method is evaluated on open-sourced models, and experiment results show our method achieves significant improvement on both complex and general instruction-following benchmarks, surpassing previous self-alignment methods.

Related papers

SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling [58.05959902776133]
We introduce Single-Pass.<n>with Reference-Guided Evaluation (SPARE), a novel structured framework that enables efficient per-step annotation.<n>We demonstrate SPARE's effectiveness across four diverse datasets spanning mathematical reasoning (GSM8K, MATH), multi-hop question answering (MuSiQue-Ans), and spatial reasoning (SpaRP)<n>On ProcessBench, SPARE demonstrates data-efficient out-of-distribution generalization, using only $sim$16% of training samples compared to human-labeled and other synthetically trained baselines.
arXiv Detail & Related papers (2025-06-18T14:37:59Z)
Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks [81.44256822500257]
RLHF has emerged as a predominant approach for aligning artificial intelligence systems with human preferences.<n> RLHF exhibits insufficient compliance capabilities when confronted with complex multi-instruction tasks.<n>We propose a novel Multi-level Aware Preference Learning (MAPL) framework, capable of enhancing multi-instruction capabilities.
arXiv Detail & Related papers (2025-05-19T08:33:11Z)
AIR: Complex Instruction Generation via Automatic Iterative Refinement [29.639832268719363]
Current approaches to generating complex instructions are often irrelevant to the current instruction requirements. We propose a novel automatic iterative refinement framework to generate complex instructions with constraints. We construct the AIR-10K dataset with 10K complex instructions and demonstrate that instructions generated with our approach significantly improve the model's ability to follow complex instructions.
arXiv Detail & Related papers (2025-02-25T02:39:57Z)
A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts. With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS) Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements. High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z)
Constraint Back-translation Improves Complex Instruction Following of Large Language Models [55.60192044049083]
Large language models (LLMs) struggle to follow instructions with complex constraints in format, length, etc. Previous works conduct post-training on complex instruction-response pairs generated by feeding complex instructions to advanced LLMs. We propose a novel data generation technique, constraint back-translation.
arXiv Detail & Related papers (2024-10-31T17:42:26Z)
Align$^2$LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation [56.75665429851673]
This paper introduces a novel instruction curation algorithm, derived from two unique perspectives, human and LLM preference alignment.<n>Experiments demonstrate that we can maintain or even improve model performance by compressing synthetic multimodal instructions by up to 90%.
arXiv Detail & Related papers (2024-09-27T08:20:59Z)
Benchmarking Complex Instruction-Following with Multiple Constraints Composition [72.82640456309821]
How to evaluate the ability of complex instruction-following of large language models (LLMs) has become a critical research problem. Existing benchmarks mainly focus on modeling different types of constraints in human instructions while neglecting the composition of different constraints. We propose ComplexBench, a benchmark for comprehensively evaluating the ability of LLMs to follow complex instructions composed of multiple constraints.
arXiv Detail & Related papers (2024-07-04T14:50:45Z)
Enhancing and Assessing Instruction-Following with Fine-Grained Instruction Variants [28.691691883519542]
We introduce a technique that decomposes complex instructions into simpler sub-components, modifies these, and reconstructs them into new variants. Based on DeMoRecon, we developed the FGIV dataset which contains fine-grained instruction variants of 1,773 seed instructions. Our findings show that LLMs fine-tuned with FGIV will gain significant performance boost on both ours and commonly used instructions-following benchmarks.
arXiv Detail & Related papers (2024-06-17T08:08:11Z)
From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models [43.869374263102934]
We study what training data is effective in enhancing complex constraints following abilities. We find that training LLMs with instructions containing multiple constraints enhances their understanding of complex instructions. Our methods improve models' ability to follow instructions generally and generalize effectively across out-of-domain, in-domain, and adversarial settings.
arXiv Detail & Related papers (2024-04-24T12:51:14Z)
Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models [23.17547206140014]
We introduce Conifer, an instruction tuning dataset for large language models. We train models with Conifer to follow instructions with complex constraints. On several instruction-following benchmarks, our 7B model outperforms the state-of-the-art open-source 7B models.
arXiv Detail & Related papers (2024-04-03T15:55:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.