Related papers: Radiology Report Generation via Multi-objective Preference Optimization

Radiology Report Generation via Multi-objective Preference Optimization

URL: http://arxiv.org/abs/2412.08901v2
Date: Fri, 13 Dec 2024 02:55:30 GMT
Title: Radiology Report Generation via Multi-objective Preference Optimization
Authors: Ting Xiao, Lei Shi, Peng Liu, Zhe Wang, Chenjia Bai,
Abstract summary: We propose a new RRG method via Multi-objective Preference Optimization (MPO) to align the pre-trained RRG model with radiologists' preferences.<n>The proposed method can generate reports that cater to different preferences in a single model and achieve state-of-the-art performance.
Score: 9.158978491482276
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatic Radiology Report Generation (RRG) is an important topic for alleviating the substantial workload of radiologists. Existing RRG approaches rely on supervised regression based on different architectures or additional knowledge injection,while the generated report may not align optimally with radiologists' preferences. Especially, since the preferences of radiologists are inherently heterogeneous and multidimensional, e.g., some may prioritize report fluency, while others emphasize clinical accuracy. To address this problem,we propose a new RRG method via Multi-objective Preference Optimization (MPO) to align the pre-trained RRG model with multiple human preferences, which can be formulated by multi-dimensional reward functions and optimized by multi-objective reinforcement learning (RL). Specifically, we use a preference vector to represent the weight of preferences and use it as a condition for the RRG model. Then, a linearly weighed reward is obtained via a dot product between the preference vector and multi-dimensional reward. Next,the RRG model is optimized to align with the preference vector by optimizing such a reward via RL. In the training stage,we randomly sample diverse preference vectors from the preference space and align the model by optimizing the weighted multi-objective rewards, which leads to an optimal policy on the entire preference space. When inference,our model can generate reports aligned with specific preferences without further fine-tuning. Extensive experiments on two public datasets show the proposed method can generate reports that cater to different preferences in a single model and achieve state-of-the-art performance.

Related papers

Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
Online Iterative Self-Alignment for Radiology Report Generation [10.287396040943575]
This paper proposes a novel Online Iterative Self-Alignment (OISA) method for Radiology Report Generation (RRG)<n>Our approach allows for generating varied reports tailored to specific clinical objectives, enhancing the overall performance of the RRG model iteratively.
arXiv Detail & Related papers (2025-05-17T12:31:12Z)
AMPO: Active Multi-Preference Optimization [16.230186347702737]
Multi-preference optimization enriches language-model alignment beyond pairwise preferences by contrasting entire sets of helpful and undesired responses. We propose $textitActive Multi-Preference Optimization$ (AMPO), a novel approach that combines on-policy generation, a multi-preference group-contrastive loss, and active subset selection. AMPO achieves state-of-the-art results on $textitAlpacaEval$ using Llama 8B.
arXiv Detail & Related papers (2025-02-25T15:29:51Z)
Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization [66.67988187816185]
We aim to emphscale up the number of on-policy samples via repeated random sampling to improve alignment performance. Our experiments reveal that this strategy leads to a emphdecline in performance as the sample size increases. We introduce a scalable preference data construction strategy that consistently enhances model performance as the sample scale increases.
arXiv Detail & Related papers (2025-02-24T04:22:57Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications. Ensuring their alignment with the diverse preferences of individual users has become a critical challenge. We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences. We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration. Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z)
Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval [32.104911827710936]
We propose a new task-level Distributionally Robust Optimization (tDRO) algorithm for Large Language Model-based Dense Retrieval fine-tuning.<n>The tDRO parameterizes the domain weights and updates them with scaled domain gradients.<n>Experiments show optimal improvements in large-scale retrieval benchmarks and reduce up to 30% dataset usage.
arXiv Detail & Related papers (2024-08-20T07:48:19Z)
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values. We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO) Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z)
RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching) To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth. We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z)
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization [76.09576643028362]
We present Multi-Objective Direct Preference Optimization (MODPO) for multiple alignment objectives. MODPO folds language modeling directly into reward modeling, training language models as implicit collective reward models. It theoretically yields the same optimal solutions as MORLHF but is practically more stable and efficient.
arXiv Detail & Related papers (2023-10-05T17:35:26Z)
Optimally Weighted Ensembles of Regression Models: Exact Weight Optimization and Applications [0.0]
We show that combining different regression models can yield better results than selecting a single ('best') regression model. We outline an efficient method that obtains optimally weighted linear combination from a heterogeneous set of regression models.
arXiv Detail & Related papers (2022-06-22T09:11:14Z)
i-Razor: A Differentiable Neural Input Razor for Feature Selection and Dimension Search in DNN-Based Recommender Systems [8.992480061695138]
Noisy features and inappropriate embedding dimension assignments can deteriorate the performance of recommender systems. We propose a differentiable neural input razor (i-Razor) that enables joint optimization of feature selection and dimension search.
arXiv Detail & Related papers (2022-04-01T08:30:06Z)
Multi-Objective Hyperparameter Tuning and Feature Selection using Filter Ensembles [0.8029049649310213]
We treat feature selection as a multi-objective optimization task. First uses multi-objective model-based optimization. Second is an evolutionary NSGA-II-based wrapper approach to feature selection.
arXiv Detail & Related papers (2019-12-30T13:04:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.