Related papers: Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum

URL: http://arxiv.org/abs/2510.00526v1
Date: Wed, 01 Oct 2025 05:17:47 GMT
Title: Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum
Authors: Gaotang Li, Ruizhong Qiu, Xiusi Chen, Heng Ji, Hanghang Tong,
Abstract summary: We study a family of probability-based objectives and characterize their effectiveness under different conditions.<n>We uncover a critical dimension that governs objective behavior: the model-capability continuum.<n>Our theoretical analysis further elucidates how objectives trade places across the continuum, providing a principled foundation for adapting objectives to model capability.
Score: 88.90314335542281
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Supervised fine-tuning (SFT) is the standard approach for post-training large language models (LLMs), yet it often shows limited generalization. We trace this limitation to its default training objective: negative log likelihood (NLL). While NLL is classically optimal when training from scratch, post-training operates in a different paradigm and could violate its optimality assumptions, where models already encode task-relevant priors and supervision can be long and noisy. To this end, we study a general family of probability-based objectives and characterize their effectiveness under different conditions. Through comprehensive experiments and extensive ablation studies across 7 model backbones, 14 benchmarks, and 3 domains, we uncover a critical dimension that governs objective behavior: the model-capability continuum. Near the model-strong end, prior-leaning objectives that downweight low-probability tokens (e.g., $-p$, $-p^{10}$, thresholded variants) consistently outperform NLL; toward the model-weak end, NLL dominates; in between, no single objective prevails. Our theoretical analysis further elucidates how objectives trade places across the continuum, providing a principled foundation for adapting objectives to model capability. Our code is available at https://github.com/GaotangLi/Beyond-Log-Likelihood.

Related papers

Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective [85.06838178922791]
Reinforcement Learning (RL) has proven highly effective for autoregressive language models.<n>But adapting these methods to diffusion large language models (dLLMs) presents fundamental challenges.<n>We propose a principled RL framework that treats entire sequence generation as a single action and uses the ELBO as a tractable sequence-level likelihood proxy.
arXiv Detail & Related papers (2025-12-03T13:05:32Z)
Alignment as Distribution Learning: Your Preference Model is Explicitly a Language Model [12.063078727764045]
We argue that alignment via reinforcement learning from human feedback lacks theoretical justification and incentivizes deterministic solutions.<n>We propose three principled learning objectives: preference maximum likelihood estimation, preference distillation, and reverse KL minimization.<n>We empirically demonstrate that our distribution learning framework, especially preference distillation, consistently outperforms or matches the performances of RLHF and DPO.
arXiv Detail & Related papers (2025-06-02T10:36:31Z)
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models [86.88657425848547]
Large reasoning models (LRMs) already possess a latent capacity for long chain-of-thought reasoning.<n>We explicitly align models with three meta-abilities: deduction, induction, and abduction, using automatically generated, self-verifiable tasks.<n>Our three stage-pipeline individual alignment, parameter-space merging, and domain-specific reinforcement learning, boosts performance by over 10% relative to instruction-tuned baselines.
arXiv Detail & Related papers (2025-05-15T17:58:33Z)
Continuous Visual Autoregressive Generation via Score Maximization [69.67438563485887]
We introduce a Continuous VAR framework that enables direct visual autoregressive generation without vector quantization.<n>Within this framework, all we need is to select a strictly proper score and set it as the training objective to optimize.
arXiv Detail & Related papers (2025-05-12T17:58:14Z)
DeAL: Decoding-time Alignment for Large Language Models [59.63643988872571]
Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences. We propose DeAL, a framework that allows the user to customize reward functions and enables Detime Alignment of LLMs. Our experiments show that we can DeAL with fine-grained trade-offs, improve adherence to alignment objectives, and address residual gaps in LLMs.
arXiv Detail & Related papers (2024-02-05T06:12:29Z)
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent. We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z)
Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups. We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective. Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z)
Value Gradient weighted Model-Based Reinforcement Learning [28.366157882991565]
Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies. VaGraM is a novel method for value-aware model learning.
arXiv Detail & Related papers (2022-04-04T13:28:31Z)
Contextual Classification Using Self-Supervised Auxiliary Models for Deep Neural Networks [6.585049648605185]
We introduce the notion of Self-Supervised Autogenous Learning (SSAL) models. A SSAL objective is realized through one or more additional targets that are derived from the original supervised classification task. We show that SSAL models consistently outperform the state-of-the-art while also providing structured predictions that are more interpretable.
arXiv Detail & Related papers (2021-01-07T18:41:16Z)
Objective Mismatch in Model-based Reinforcement Learning [14.92062504466269]
Model-based reinforcement learning (MBRL) has been shown to be a powerful framework for data-efficiently learning control of continuous tasks. We identify a fundamental issue of the standard MBRL framework -- what we call the objective mismatch issue. We propose an initial method to mitigate the mismatch issue by re-weighting dynamics model training.
arXiv Detail & Related papers (2020-02-11T16:26:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.