Embedding-based Retrieval in Multimodal Content Moderation
- URL: http://arxiv.org/abs/2507.01066v1
- Date: Mon, 30 Jun 2025 19:11:25 GMT
- Title: Embedding-based Retrieval in Multimodal Content Moderation
- Authors: Hanzhong Liang, Jinghao Shi, Xiang Shen, Zixuan Wang, Vera Wen, Ardalan Mehrani, Zhiqian Chen, Yifan Wu, Zhixin Zhang,
- Abstract summary: We introduce an Embedding-Based Retrieval (EBR) method designed to complement traditional classification approaches.<n>EBR increases action rates by 10.32% and reduces operational costs by over 80%, while also enhancing interpretability and flexibility.
- Score: 20.899256623912933
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video understanding plays a fundamental role for content moderation on short video platforms, enabling the detection of inappropriate content. While classification remains the dominant approach for content moderation, it often struggles in scenarios requiring rapid and cost-efficient responses, such as trend adaptation and urgent escalations. To address this issue, we introduce an Embedding-Based Retrieval (EBR) method designed to complement traditional classification approaches. We first leverage a Supervised Contrastive Learning (SCL) framework to train a suite of foundation embedding models, including both single-modal and multi-modal architectures. Our models demonstrate superior performance over established contrastive learning methods such as CLIP and MoCo. Building on these embedding models, we design and implement the embedding-based retrieval system that integrates embedding generation and video retrieval to enable efficient and effective trend handling. Comprehensive offline experiments on 25 diverse emerging trends show that EBR improves ROC-AUC from 0.85 to 0.99 and PR-AUC from 0.35 to 0.95. Further online experiments reveal that EBR increases action rates by 10.32% and reduces operational costs by over 80%, while also enhancing interpretability and flexibility compared to classification-based solutions.
Related papers
- Dual-level Modality Debiasing Learning for Unsupervised Visible-Infrared Person Re-Identification [59.59359638389348]
We propose a Dual-level Modality Debiasing Learning framework that implements debiasing at both the model and optimization levels.<n>Experiments on benchmark datasets demonstrate that DMDL could enable modality-invariant feature learning and a more generalized model.
arXiv Detail & Related papers (2025-12-03T12:43:16Z) - Bridging VLMs and Embodied Intelligence with Deliberate Practice Policy Optimization [72.20212909644017]
Deliberate Practice Policy Optimization (DPPO) is a metacognitive Metaloop'' training framework.<n>DPPO alternates between supervised fine-tuning (competence expansion) and reinforcement learning (skill refinement)<n> Empirically, training a vision-language embodied model with DPPO, referred to as Pelican-VL 1.0, yields a 20.3% performance improvement over the base model.<n>We are open-sourcing both the models and code, providing the first systematic framework that alleviates the data and resource bottleneck.
arXiv Detail & Related papers (2025-11-20T17:58:04Z) - Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model [62.889356203346985]
We propose DUal-STream diffusion (DUST), a world-model augmented VLA framework that handles the modality conflict.<n>DUST achieves up to 6% gains over a standard VLA baseline and implicit world-modeling methods.<n>On real-world tasks with the Franka Research 3, DUST outperforms baselines in success rate by 13%.
arXiv Detail & Related papers (2025-10-31T16:32:12Z) - Joint Discriminative-Generative Modeling via Dual Adversarial Training [4.884832758265374]
We propose a novel training framework that integrates adversarial training principles for both discriminative robustness and stable generative learning.<n> Experiments on CIFAR-10, CIFAR-100, and ImageNet demonstrate that our method substantially improves adversarial robustness over existing hybrid models while maintaining competitive generative performance.
arXiv Detail & Related papers (2025-10-13T13:07:22Z) - LLM Routing with Dueling Feedback [49.67815163970033]
We study the problem of selecting the best model for each query while balancing user satisfaction, model expertise, and inference cost.<n>We formulate routing as contextual dueling bandits, learning from pairwise preference feedback rather than absolute scores.<n>We introduce Category-Calibrated Fine-Tuning (CCFT), a representation-learning method that derives model embeddings from offline data using contrastive fine-tuning with categorical weighting.
arXiv Detail & Related papers (2025-10-01T12:52:25Z) - Video-Based MPAA Rating Prediction: An Attention-Driven Hybrid Architecture Using Contrastive Learning [0.1749935196721634]
We employ contrastive learning for improved discrimination and adaptability, exploring three frameworks: Instance Discrimination, Contextual Contrastive Learning, and Multi-View Contrastive Learning.<n>Our hybrid architecture integrates an LRCN (CNN+LSTM) backbone with a Bahdanau attention mechanism, achieving state-of-the-art performance in the Contextual Contrastive Learning framework, with 88% accuracy and an F1 score of 0.8815.<n>We evaluate the model's performance across various contrastive loss functions, including NT-Xent, NT-logistic, and Triple Margint, demonstrating the robustness of our proposed architecture.
arXiv Detail & Related papers (2025-09-08T16:01:02Z) - STARec: An Efficient Agent Framework for Recommender Systems via Autonomous Deliberate Reasoning [54.28691219536054]
We introduce STARec, a slow-thinking augmented agent framework that endows recommender systems with autonomous deliberative reasoning capabilities.<n>We develop anchored reinforcement training - a two-stage paradigm combining structured knowledge distillation from advanced reasoning models with preference-aligned reward shaping.<n>Experiments on MovieLens 1M and Amazon CDs benchmarks demonstrate that STARec achieves substantial performance gains compared with state-of-the-art baselines.
arXiv Detail & Related papers (2025-08-26T08:47:58Z) - RecLLM-R1: A Two-Stage Training Paradigm with Reinforcement Learning and Chain-of-Thought v1 [20.92548890511589]
This paper introduces RecLLM-R1, a novel recommendation framework leveraging Large Language Models (LLMs)<n> RecLLM-R1 significantly surpasses existing baseline methods across a spectrum of evaluation metrics, including accuracy, diversity, and novelty.
arXiv Detail & Related papers (2025-06-24T01:39:34Z) - Enhancing CTR Prediction with De-correlated Expert Networks [53.05653547330796]
We propose a De-Correlated MoE (D-MoE) framework, which introduces a Cross-Expert De-Correlation loss to minimize expert correlations.<n>Extensive experiments have been conducted to validate the effectiveness of D-MoE and the de-correlation principle.
arXiv Detail & Related papers (2025-05-23T14:04:38Z) - Offline Learning for Combinatorial Multi-armed Bandits [56.96242764723241]
Off-CMAB is the first offline learning framework for CMAB.<n>Off-CMAB combines pessimistic reward estimations with solvers.<n>Experiments on synthetic and real-world datasets highlight the superior performance of CLCB.
arXiv Detail & Related papers (2025-01-31T16:56:18Z) - LOLA: LLM-Assisted Online Learning Algorithm for Content Experiments [2.2021543101231167]
Modern media firms require automated and efficient methods to identify content that is most engaging and appealing to users.
We first investigate the ability of three pure-LLM approaches to identify the catchiest headline: prompt-based methods, embedding-based methods, and fine-tuned open-source LLMs.
We then introduce the LLM-Assisted Online Learning Algorithm (LOLA), a novel framework that integrates Large Language Models (LLMs) with adaptive experimentation to optimize content delivery.
arXiv Detail & Related papers (2024-06-03T07:56:58Z) - Relaxed Contrastive Learning for Federated Learning [48.96253206661268]
We propose a novel contrastive learning framework to address the challenges of data heterogeneity in federated learning.
Our framework outperforms all existing federated learning approaches by huge margins on the standard benchmarks.
arXiv Detail & Related papers (2024-01-10T04:55:24Z) - Dynamic Sub-graph Distillation for Robust Semi-supervised Continual Learning [47.64252639582435]
We focus on semi-supervised continual learning (SSCL), where the model progressively learns from partially labeled data with unknown categories.<n>We propose a novel approach called Dynamic Sub-Graph Distillation (DSGD) for semi-supervised continual learning.
arXiv Detail & Related papers (2023-12-27T04:40:12Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Sample Less, Learn More: Efficient Action Recognition via Frame Feature
Restoration [59.6021678234829]
We propose a novel method to restore the intermediate features for two sparsely sampled and adjacent video frames.
With the integration of our method, the efficiency of three commonly used baselines has been improved by over 50%, with a mere 0.5% reduction in recognition accuracy.
arXiv Detail & Related papers (2023-07-27T13:52:42Z) - REX: Rapid Exploration and eXploitation for AI Agents [103.68453326880456]
We propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX.
REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance.
arXiv Detail & Related papers (2023-07-18T04:26:33Z) - Class-Incremental Mixture of Gaussians for Deep Continual Learning [15.49323098362628]
We propose end-to-end incorporation of the mixture of Gaussians model into the continual learning framework.
We show that our model can effectively learn in memory-free scenarios with fixed extractors.
arXiv Detail & Related papers (2023-07-09T04:33:19Z) - Cross-Stream Contrastive Learning for Self-Supervised Skeleton-Based
Action Recognition [22.067143671631303]
Self-supervised skeleton-based action recognition enjoys a rapid growth along with the development of contrastive learning.
We propose a Cross-Stream Contrastive Learning framework for skeleton-based action Representation learning (CSCLR)
Specifically, the proposed CSCLR not only utilizes intra-stream contrast pairs, but introduces inter-stream contrast pairs as hard samples to formulate a better representation learning.
arXiv Detail & Related papers (2023-05-03T10:31:35Z) - Mitigating Forgetting in Online Continual Learning via Contrasting
Semantically Distinct Augmentations [22.289830907729705]
Online continual learning (OCL) aims to enable model learning from a non-stationary data stream to continuously acquire new knowledge as well as retain the learnt one.
Main challenge comes from the "catastrophic forgetting" issue -- the inability to well remember the learnt knowledge while learning the new ones.
arXiv Detail & Related papers (2022-11-10T05:29:43Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.