Related papers: Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

URL: http://arxiv.org/abs/2510.23451v1
Date: Mon, 27 Oct 2025 15:53:20 GMT
Title: Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
Authors: Zhuoran Jin, Hongbang Yuan, Kejian Zhu, Jiachun Li, Pengfei Cao, Yubo Chen, Kang Liu, Jun Zhao,
Abstract summary: We propose Omni-Reward, a step toward generalist omni-modal reward modeling with support for free-form preferences.<n>We construct a multimodal preference dataset comprising 248K general preference pairs and 69K instruction-tuning pairs for training generalist omni-modal RMs.<n>We propose Omni-RewardModel, which includes both discriminative and generative RMs, and achieves strong performance on Omni-RewardBench as well as other widely used reward modeling benchmarks.
Score: 38.99630864553283
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reward models (RMs) play a critical role in aligning AI behaviors with human preferences, yet they face two fundamental challenges: (1) Modality Imbalance, where most RMs are mainly focused on text and image modalities, offering limited support for video, audio, and other modalities; and (2) Preference Rigidity, where training on fixed binary preference pairs fails to capture the complexity and diversity of personalized preferences. To address the above challenges, we propose Omni-Reward, a step toward generalist omni-modal reward modeling with support for free-form preferences, consisting of: (1) Evaluation: We introduce Omni-RewardBench, the first omni-modal RM benchmark with free-form preferences, covering nine tasks across five modalities including text, image, video, audio, and 3D; (2) Data: We construct Omni-RewardData, a multimodal preference dataset comprising 248K general preference pairs and 69K instruction-tuning pairs for training generalist omni-modal RMs; (3) Model: We propose Omni-RewardModel, which includes both discriminative and generative RMs, and achieves strong performance on Omni-RewardBench as well as other widely used reward modeling benchmarks.

Related papers

OmniRet: Efficient and High-Fidelity Omni Modality Retrieval [51.80205678389465]
We present OmniRet, the first retrieval model capable of handling complex, composed queries spanning three key modalities: text, vision, and audio.<n>Our model demonstrates significant improvements on composed query, audio and video retrieval tasks, while achieving on-par performance with state-of-the-art models on others.
arXiv Detail & Related papers (2026-03-02T17:19:55Z)
OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention [31.594799790151345]
We propose OmniVideo-R1, a novel reinforced framework that improves mixed-modality reasoning.<n>Experiments on multiple benchmarks demonstrate that OmniVideo-R1 consistently outperforms strong baselines.
arXiv Detail & Related papers (2026-02-05T16:35:19Z)
Omni-RRM: Advancing Omni Reward Modeling via Automatic Rubric-Grounded Preference Synthesis [22.55861092515539]
A critical bottleneck remains the lack of effective reward models (RMs)<n>We introduce textbf Omni-RRM, the first open-source rubric-grounded reward model.<n>It produces structured, multi-dimension preference judgments with dimension-wise justifications across textbftext, image, video, and audio
arXiv Detail & Related papers (2026-01-31T18:20:45Z)
UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in OmniModels [12.233067923710635]
Multimodal Large Languages models have been progressing from uni-modal understanding toward unifying visual, audio and language modalities, collectively termed omni models.<n>We propose a novel, high quality and UNified Omni model benchmark, UNO-Bench, which effectively assesses both UNi-modal and Omni-modal capabilities.<n>The benchmark consists of curated human samples, with 98% cross-modality solvability, across 44 task types, and an innovative multi-step open-ended question type for assessing complex reasoning.
arXiv Detail & Related papers (2025-10-21T06:14:40Z)
Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs [28.41899655478021]
We propose Omni-DPO, a dual-perspective optimization framework that accounts for the inherent quality of each preference pair and the model's evolving performance on those pairs.<n> Experimental results on various models and benchmarks demonstrate the superiority and generalization capabilities of Omni-DPO.
arXiv Detail & Related papers (2025-06-11T17:58:05Z)
RoboEgo System Card: An Omnimodal Model with Native Full Duplexity [48.52383812141669]
RoboEgo (alias: FLM-Ego) is a unified model system designed to address both challenges.<n>FLM-Ego incorporates backbone and algorithms that support fullity, achieving a theoretical duplex of 80 ms latency.
arXiv Detail & Related papers (2025-06-02T17:53:10Z)
Ola: Pushing the Frontiers of Omni-Modal Language Model [88.72389428177942]
We present Ola, an omni-modal language model that achieves competitive performance across image, video, and audio understanding.<n>Ola incorporates advanced visual understanding and audio recognition capabilities through several critical and effective improvements.<n>We aim to make Ola a fully open omni-modal understanding solution to advance future research in this emerging field.
arXiv Detail & Related papers (2025-02-06T18:59:55Z)
Baichuan-Omni-1.5 Technical Report [78.49101296394218]
Baichuan- Omni-1.5 is an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities.<n>We establish a comprehensive data cleaning and synthesis pipeline for multimodal data, obtaining about 500B high-quality data.<n>Second, an audio-tokenizer has been designed to capture both semantic and acoustic information from audio, enabling seamless integration and enhanced compatibility with MLLM.
arXiv Detail & Related papers (2025-01-26T02:19:03Z)
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalities [124.05360767047539]
We introduce OmnixR, an evaluation suite designed to benchmark SoTA Omni-modality Language Models. evaluating OLMs, which integrate multiple modalities such as text, vision, and audio, presents unique challenges. Our experiments find that all state-of-the-art OLMs struggle with OmnixR questions that require integrating information from multiple modalities to answer.
arXiv Detail & Related papers (2024-10-16T04:29:46Z)
OmniBench: Towards The Future of Universal Omni-Language Models [63.16606414452612]
We introduce OmniBench, a novel benchmark designed to evaluate models' ability to recognize, interpret, and reason across visual, acoustic, and textual inputs simultaneously.<n>Our evaluation reveals that open-source OLMs show significant limitations in instruction-following and reasoning in tri-modal contexts.<n>We advocate for developing more robust tri-modal integration techniques and training strategies to enhance OLM performance.
arXiv Detail & Related papers (2024-09-23T17:59:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.