Related papers: Multi-Metric Preference Alignment for Generative Speech Restoration

Multi-Metric Preference Alignment for Generative Speech Restoration

URL: http://arxiv.org/abs/2508.17229v1
Date: Sun, 24 Aug 2025 07:05:10 GMT
Title: Multi-Metric Preference Alignment for Generative Speech Restoration
Authors: Junan Zhang, Xueyao Zhang, Jing Yang, Yuancheng Wang, Fan Fan, Zhizheng Wu,
Abstract summary: We propose a multi-metric preference alignment strategy for generative models.<n>We observe consistent and significant performance gains across three diverse generative paradigms.<n>Our aligned models can serve as powerful ''data annotators'', generating high-quality pseudo-labels.
Score: 15.696247605348383
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent generative models have significantly advanced speech restoration tasks, yet their training objectives often misalign with human perceptual preferences, resulting in suboptimal quality. While post-training alignment has proven effective in other generative domains like text and image generation, its application to generative speech restoration remains largely under-explored. This work investigates the challenges of applying preference-based post-training to this task, focusing on how to define a robust preference signal and curate high-quality data to avoid reward hacking. To address these challenges, we propose a multi-metric preference alignment strategy. We construct a new dataset, GenSR-Pref, comprising 80K preference pairs, where each chosen sample is unanimously favored by a complementary suite of metrics covering perceptual quality, signal fidelity, content consistency, and timbre preservation. This principled approach ensures a holistic preference signal. Applying Direct Preference Optimization (DPO) with our dataset, we observe consistent and significant performance gains across three diverse generative paradigms: autoregressive models (AR), masked generative models (MGM), and flow-matching models (FM) on various restoration benchmarks, in both objective and subjective evaluations. Ablation studies confirm the superiority of our multi-metric strategy over single-metric approaches in mitigating reward hacking. Furthermore, we demonstrate that our aligned models can serve as powerful ''data annotators'', generating high-quality pseudo-labels to serve as a supervision signal for traditional discriminative models in data-scarce scenarios like singing voice restoration. Demo Page:https://gensr-pref.github.io

Related papers

SoliReward: Mitigating Susceptibility to Reward Hacking and Annotation Noise in Video Generation Reward Models [53.19726629537694]
Post-training alignment of video generation models with human preferences is a critical goal.<n>Current data collection paradigms, reliant on in-prompt pairwise annotations, suffer from labeling noise.<n>We propose SoliReward, a systematic framework for video RM training.
arXiv Detail & Related papers (2025-12-17T14:28:23Z)
Robust Preference Alignment via Directional Neighborhood Consensus [13.313830197011983]
We introduce Robust Preference Selection (RPS), a post-hoc, training-free method by leveraging directional neighborhood consensus.<n>RPS samples multiple responses from a local neighborhood of related preferences to create a superior candidate pool.<n>Our work presents a practical, theoretically-grounded solution for enhancing the reliability of preference-aligned models.
arXiv Detail & Related papers (2025-10-23T12:39:20Z)
Temporal Self-Rewarding Language Models: Decoupling Chosen-Rejected via Past-Future [38.1810626252963]
Self-Rewarding Language Models propose an architecture in which the Large Language Models(LLMs) both generates responses and evaluates its own outputs via LLM-as-a-Judge prompting.<n>We propose textbf Self-Rewarding Language Models that strategically coordinate past, present, and future model generations to sustain learning signals.
arXiv Detail & Related papers (2025-08-08T05:25:54Z)
Fake it till You Make it: Reward Modeling as Discriminative Prediction [49.31309674007382]
GAN-RM is an efficient reward modeling framework that eliminates manual preference annotation and explicit quality dimension engineering.<n>Our method trains the reward model through discrimination between a small set of representative, unpaired target samples.<n>Experiments demonstrate our GAN-RM's effectiveness across multiple key applications.
arXiv Detail & Related papers (2025-06-16T17:59:40Z)
Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models [7.61977883644433]
We propose four dimensions to evaluate data quality: professionalism, readability, reasoning, and cleanliness.<n>We introduce Meta-rater, a multi-dimensional data selection method that integrates these dimensions with existing quality metrics through learned optimal weightings.<n>Experiments demonstrate that Meta-rater doubles convergence speed for 1.3B parameter models and improves downstream task performance by 3.23, with advantages that scale to models as large as 7.2B parameters.
arXiv Detail & Related papers (2025-04-19T06:12:33Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.<n> Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.<n>We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
Acoustic Model Optimization over Multiple Data Sources: Merging and Valuation [13.009945735929445]
We propose a novel paradigm to solve salient problems plaguing the Automatic Speech Recognition field. In the first stage, multiple acoustic models are trained based upon different subsets of the complete speech data. In the second stage, two novel algorithms are utilized to generate a high-quality acoustic model.
arXiv Detail & Related papers (2024-10-21T03:48:23Z)
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference. Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable. We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z)
Boosting Reward Model with Preference-Conditional Multi-Aspect Synthetic Data Generation [62.9933120822879]
RMBoost is a novel synthetic preference data generation paradigm.<n>It reduces labeling noise since preference pairs are constructed intentionally.<n>It significantly boosts the performance of four distinct reward models.
arXiv Detail & Related papers (2024-07-22T19:21:55Z)
RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation [24.374185140811115]
Reinforcement learning from human feedback (RLHF) has been an effective technique for aligning AI systems with human values. In this paper, we focus on addressing the issues due to the inherent heterogeneity in human preferences, as well as their potential strategic behavior in providing feedback. We propose two frameworks to address heterogeneous human feedback in principled ways: personalization-based one and aggregation-based one.
arXiv Detail & Related papers (2024-04-30T23:57:23Z)
Precision-Recall Divergence Optimization for Generative Modeling with GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows. We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences. Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.