Related papers: RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services

RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services

URL: http://arxiv.org/abs/2511.07070v1
Date: Mon, 10 Nov 2025 13:04:34 GMT
Title: RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
Authors: Fei Zhao, Chonggang Lu, Haofu Qian, Fangcheng Shi, Zijie Meng, Jianzhao Huang, Xu Tang, Zheyong Xie, Zheyu Ye, Zhe Xu, Yao Hu, Shaosheng Cao,
Abstract summary: Social networking services (SNS) pose unique challenges for large language models (LLMs)<n>We introduce RedOne 2.0, an SNS-oriented LLM trained with a progressive, RL-prioritized post-training paradigm designed for rapid and stable adaptation.
Score: 25.000215454344126
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As a key medium for human interaction and information exchange, social networking services (SNS) pose unique challenges for large language models (LLMs): heterogeneous workloads, fast-shifting norms and slang, and multilingual, culturally diverse corpora that induce sharp distribution shift. Supervised fine-tuning (SFT) can specialize models but often triggers a ``seesaw'' between in-distribution gains and out-of-distribution robustness, especially for smaller models. To address these challenges, we introduce RedOne 2.0, an SNS-oriented LLM trained with a progressive, RL-prioritized post-training paradigm designed for rapid and stable adaptation. The pipeline consist in three stages: (1) Exploratory Learning on curated SNS corpora to establish initial alignment and identify systematic weaknesses; (2) Targeted Fine-Tuning that selectively applies SFT to the diagnosed gaps while mixing a small fraction of general data to mitigate forgetting; and (3) Refinement Learning that re-applies RL with SNS-centric signals to consolidate improvements and harmonize trade-offs across tasks. Across various tasks spanning three categories, our 4B scale model delivers an average improvements about 2.41 over the 7B sub-optimal baseline. Additionally, RedOne 2.0 achieves average performance lift about 8.74 from the base model with less than half the data required by SFT-centric method RedOne, evidencing superior data efficiency and stability at compact scales. Overall, RedOne 2.0 establishes a competitive, cost-effective baseline for domain-specific LLMs in SNS scenario, advancing capability without sacrificing robustness.

Related papers

QP-OneModel: A Unified Generative LLM for Multi-Task Query Understanding in Xiaohongshu Search [19.376785819604923]
QP-OneModel bridges user intent and content supply in SNS search engines.<n>It generates intent descriptions as a novel high-fidelity semantic signal.<n>It also exhibits superior generalization, surpassing a 32B model by 7.60% accuracy on unseen tasks.
arXiv Detail & Related papers (2026-02-10T15:38:17Z)
ELSA: Efficient LLM-Centric Split Aggregation for Privacy-Aware Hierarchical Federated Learning over Resource-Constrained Edge Networks [22.53431546014934]
Training large language models (LLMs) at the network edge faces fundamental challenges arising from device resource constraints, severe data heterogeneity, and heightened privacy risks.<n>We propose ELSA, a novel framework that integrates split learning (SL) and hierarchical federated learning (HFL) for distributed LLM fine-tuning over resource-constrained edge networks.<n>First, it employs a task-agnostic, behavior-aware client clustering mechanism that constructs semantic fingerprints using public probe inputs and symmetric KL divergence.<n>Second, it splits the LLM into three parts across clients and edge servers, with the cloud used only for adapter aggregation.<n>Third, it
arXiv Detail & Related papers (2026-01-20T10:33:19Z)
CoT-Saliency: Unified Chain-of-Thought Reasoning for Heterogeneous Saliency Tasks [96.64597365827046]
We present the first unified framework that jointly handles three operationally heterogeneous saliency tasks.<n>We introduce a Chain-of-Thought (CoT) reasoning process in a Vision-Language Model (VLM) to bridge task heterogeneity.<n>We show our model matches or outperforms specialized SOTA methods and strong closed-source VLMs across all tasks.
arXiv Detail & Related papers (2025-11-01T04:37:01Z)
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources [113.33902847941941]
Variance-Aware Sampling (VAS) is a data selection strategy guided by Variance Promotion Score (VPS)<n>We release large-scale, carefully curated resources containing 1.6M long CoT cold-start data and 15k RL QA pairs.<n> Experiments across mathematical reasoning benchmarks demonstrate the effectiveness of both the curated data and the proposed VAS.
arXiv Detail & Related papers (2025-09-25T14:58:29Z)
RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services [37.76677833724781]
We introduce RedOne, a domain-specific language model for social networking services (SNS)<n>RedOne was developed through a three-stage training strategy consisting of continue pretraining, supervised fine-tuning, and preference optimization.<n>It achieves an average improvement up to 14.02% across 8 major SNS tasks and 7.56% in SNS bilingual evaluation benchmark.
arXiv Detail & Related papers (2025-07-13T02:22:59Z)
How to Train Your LLM Web Agent: A Statistical Diagnosis [96.86317871461834]
We present the first statistically grounded study on compute allocation for LLM web-agent post-training.<n>Our approach uses a two-stage pipeline, training a Llama 3.1 8B student to imitate a Llama 3.3 70B teacher via supervised fine-tuning (SFT) and on-policy reinforcement learning.<n>Our results show that combining SFT with on-policy RL consistently outperforms either approach alone on both WorkArena and MiniWob++.
arXiv Detail & Related papers (2025-07-05T17:12:33Z)
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs [51.21041884010009]
Ring-lite is a Mixture-of-Experts (MoE)-based large language model optimized via reinforcement learning (RL)<n>Our approach matches the performance of state-of-the-art (SOTA) small-scale reasoning models on challenging benchmarks.
arXiv Detail & Related papers (2025-06-17T17:12:34Z)
Analytic Personalized Federated Meta-Learning [15.1961498951975]
Analytic Federated Learning (AFL) is an enhanced gradient-free learning (FL) paradigm designed to accelerate training by updating the global model in a single step with closed-form least-square (LS) solutions.<n>We propose FedACnnwise framework in which a layerwise training method is designed by modeling each layer as an LS problem.<n>It generates a personalized model for each client by analytically solving a local objective which bridges the gap between the global model and the individual data degradation.
arXiv Detail & Related papers (2025-02-10T11:27:54Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
Inter-Cell Network Slicing With Transfer Learning Empowered Multi-Agent Deep Reinforcement Learning [6.523367518762879]
Network slicing enables operators to efficiently support diverse applications on a common physical infrastructure. The ever-increasing densification of network deployment leads to complex and non-trivial inter-cell interference. We develop a DIRP algorithm with multiple deep reinforcement learning (DRL) agents to cooperatively optimize resource partition in individual cells.
arXiv Detail & Related papers (2023-06-20T14:14:59Z)
Parallel Successive Learning for Dynamic Distributed Model Training over Heterogeneous Wireless Networks [50.68446003616802]
Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices. We develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions. Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning.
arXiv Detail & Related papers (2022-02-07T05:11:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.