Related papers: Optimizing Diversity and Quality through Base-Aligned Model Collaboration

Optimizing Diversity and Quality through Base-Aligned Model Collaboration

URL: http://arxiv.org/abs/2511.05650v1
Date: Fri, 07 Nov 2025 19:00:01 GMT
Title: Optimizing Diversity and Quality through Base-Aligned Model Collaboration
Authors: Yichen Wang, Chenghao Yang, Tenghao Huang, Muhao Chen, Jonathan May, Mina Lee,
Abstract summary: We propose Base-Aligned Model Collaboration (BACo) to optimize diversity and quality.<n>BACo employs routing strategies that determine, at each token, from which model to decode.<n>BACo achieves both high diversity and quality post hoc within a single pass, while offering strong controllability.
Score: 49.59542918674004
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Alignment has greatly improved large language models (LLMs)' output quality at the cost of diversity, yielding highly similar outputs across generations. We propose Base-Aligned Model Collaboration (BACo), an inference-time token-level model collaboration framework that dynamically combines a base LLM with its aligned counterpart to optimize diversity and quality. Inspired by prior work (Fei et al., 2025), BACo employs routing strategies that determine, at each token, from which model to decode based on next-token prediction uncertainty and predicted contents' semantic role. Prior diversity-promoting methods, such as retraining, prompt engineering, and multi-sampling methods, improve diversity but often degrade quality or require costly decoding or post-training. In contrast, BACo achieves both high diversity and quality post hoc within a single pass, while offering strong controllability. We explore a family of routing strategies, across three open-ended generation tasks and 13 metrics covering diversity and quality, BACo consistently surpasses state-of-the-art inference-time baselines. With our best router, BACo achieves a 21.3% joint improvement in diversity and quality. Human evaluations also mirror these improvements. The results suggest that collaboration between base and aligned models can optimize and control diversity and quality.

Related papers

DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing [67.77471070868852]
DeepGen 1.0 is a lightweight 5B unified model for image generation and editing.<n>It is trained on only 50M samples, surpassing the 80B HunyuanImage by 28% on WISE and the 27B Qwen-Image-Edit by 37% on UniREditBench.<n>By open-sourcing our training code, weights, and datasets, we provide an efficient, high-performance alternative to democratize unified multimodal research.
arXiv Detail & Related papers (2026-02-12T17:44:24Z)
Harnessing Consistency for Robust Test-Time LLM Ensemble [88.55393815158608]
CoRE is a plug-and-play technique that harnesses model consistency for robust LLM ensemble.<n> Token-level consistency captures fine-grained disagreements by applying a low-pass filter to downweight uncertain tokens.<n>Model-level consistency models global agreement by promoting model outputs with high self-confidence.
arXiv Detail & Related papers (2025-10-12T04:18:45Z)
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources [113.33902847941941]
Variance-Aware Sampling (VAS) is a data selection strategy guided by Variance Promotion Score (VPS)<n>We release large-scale, carefully curated resources containing 1.6M long CoT cold-start data and 15k RL QA pairs.<n> Experiments across mathematical reasoning benchmarks demonstrate the effectiveness of both the curated data and the proposed VAS.
arXiv Detail & Related papers (2025-09-25T14:58:29Z)
Jointly Reinforcing Diversity and Quality in Language Model Generations [64.72289248044514]
Post-training of Large Language Models (LMs) often prioritizes accuracy and helpfulness at the expense of diversity.<n>We address this challenge with Diversity-Aware Reinforcement Learning (DARLING), a framework that jointly optimize for response quality and semantic diversity.
arXiv Detail & Related papers (2025-09-02T17:38:47Z)
Reinforcement Learning for Multi-Objective Multi-Echelon Supply Chain Optimisation [3.1194372040101928]
The model is evaluated using a multi-objective reinforcement learning (RL) method, benchmarked against an originally single-objective RL algorithm modified with weighted sum.<n>We conduct experiments on varying network complexities, mimicking typical real-world challenges using a customisable simulator.<n>The model determines production and delivery quantities across supply chain routes to achieve near-optimal trade-offs between competing objectives.
arXiv Detail & Related papers (2025-07-26T04:30:11Z)
Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes [54.93980123979578]
We introduce Latent Preference Coding (LPC), a novel framework that models the implicit factors as well as their combinations behind holistic preferences.<n>LPC seamlessly integrates with various offline alignment algorithms, automatically inferring the underlying factors and their importance from data.
arXiv Detail & Related papers (2025-05-08T06:59:06Z)
Phasic Diversity Optimization for Population-Based Reinforcement Learning [10.15130620537703]
Phasic Diversity Optimization (PDO) algorithm separates reward and diversity training into distinct phases. In the auxiliary phase, agents with poor performance diversified via determinants will not replace the better agents in the archive. We introduce two implementations of PDO archive and conduct tests in the newly proposed adversarial dogfight and MuJoCo simulations.
arXiv Detail & Related papers (2024-03-17T06:41:09Z)
Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization [13.436983663467938]
This paper introduces Quality Diversity through Human Feedback (QDHF), a novel approach that progressively infers diversity metrics from human judgments of similarity among solutions. Empirical studies show that QDHF significantly outperforms state-of-the-art methods in automatic diversity discovery. In open-ended generative tasks, QDHF substantially enhances the diversity of text-to-image generation from a diffusion model.
arXiv Detail & Related papers (2023-10-18T16:46:16Z)
An Empirical Study of Multimodal Model Merging [148.48412442848795]
Model merging is a technique that fuses multiple models trained on different tasks to generate a multi-task solution. We conduct our study for a novel goal where we can merge vision, language, and cross-modal transformers of a modality-specific architecture. We propose two metrics that assess the distance between weights to be merged and can serve as an indicator of the merging outcomes.
arXiv Detail & Related papers (2023-04-28T15:43:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.