Reliable Decision from Multiple Subtasks through Threshold Optimization:
Content Moderation in the Wild
- URL: http://arxiv.org/abs/2208.07522v5
- Date: Thu, 26 Jan 2023 02:25:19 GMT
- Title: Reliable Decision from Multiple Subtasks through Threshold Optimization:
Content Moderation in the Wild
- Authors: Donghyun Son, Byounggyu Lew, Kwanghee Choi, Yongsu Baek, Seungwoo
Choi, Beomjun Shin, Sungjoo Ha, Buru Chang
- Abstract summary: Social media platforms struggle to protect users from harmful content through content moderation.
These platforms have recently leveraged machine learning models to cope with the vast amount of user-generated content daily.
Third-party content moderation services provide prediction scores of multiple subtasks, such as predicting the existence of underage personnel, rude gestures, or weapons.
We introduce a simple yet effective threshold optimization method that searches the optimal thresholds of the multiple subtasks to make a reliable moderation decision in a cost-effective way.
- Score: 7.176020195419459
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Social media platforms struggle to protect users from harmful content through
content moderation. These platforms have recently leveraged machine learning
models to cope with the vast amount of user-generated content daily. Since
moderation policies vary depending on countries and types of products, it is
common to train and deploy the models per policy. However, this approach is
highly inefficient, especially when the policies change, requiring dataset
re-labeling and model re-training on the shifted data distribution. To
alleviate this cost inefficiency, social media platforms often employ
third-party content moderation services that provide prediction scores of
multiple subtasks, such as predicting the existence of underage personnel, rude
gestures, or weapons, instead of directly providing final moderation decisions.
However, making a reliable automated moderation decision from the prediction
scores of the multiple subtasks for a specific target policy has not been
widely explored yet. In this study, we formulate real-world scenarios of
content moderation and introduce a simple yet effective threshold optimization
method that searches the optimal thresholds of the multiple subtasks to make a
reliable moderation decision in a cost-effective way. Extensive experiments
demonstrate that our approach shows better performance in content moderation
compared to existing threshold optimization methods and heuristics.
Related papers
- $Δ\text{-}{\rm OPE}$: Off-Policy Estimation with Pairs of Policies [13.528097424046823]
We introduce $Deltatext-rm OPE$ methods based on the widely used Inverse Propensity Scoring estimator.
Simulated, offline, and online experiments show that our methods significantly improve performance for both evaluation and learning tasks.
arXiv Detail & Related papers (2024-05-16T12:04:55Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies.
Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z) - Conformal Off-Policy Evaluation in Markov Decision Processes [53.786439742572995]
Reinforcement Learning aims at identifying and evaluating efficient control policies from data.
Most methods for this learning task, referred to as Off-Policy Evaluation (OPE), do not come with accuracy and certainty guarantees.
We present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty.
arXiv Detail & Related papers (2023-04-05T16:45:11Z) - Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design [18.326126953667842]
We propose novel methods that improve the data efficiency of online Monte Carlo estimators.
We first propose a tailored closed-form behavior policy that provably reduces the variance of an online Monte Carlo estimator.
We then design efficient algorithms to learn this closed-form behavior policy from previously collected offline data.
arXiv Detail & Related papers (2023-01-31T16:12:31Z) - Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.
In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios.
We propose to leverage latent-variable policies that can represent a broader class of policy distributions.
Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z) - Variance-Optimal Augmentation Logging for Counterfactual Evaluation in
Contextual Bandits [25.153656462604268]
Methods for offline A/B testing and counterfactual learning are seeing rapid adoption in search and recommender systems.
The counterfactual estimators that are commonly used in these methods can have large bias and large variance when the logging policy is very different from the target policy being evaluated.
This paper introduces Minimum Variance Augmentation Logging (MVAL), a method for constructing logging policies that minimize the variance of the downstream evaluation or learning problem.
arXiv Detail & Related papers (2022-02-03T17:37:11Z) - Sayer: Using Implicit Feedback to Optimize System Policies [63.992191765269396]
We develop a methodology that leverages implicit feedback to evaluate and train new system policies.
Sayer builds on two ideas from reinforcement learning to leverage data collected by an existing policy.
We show that Sayer can evaluate arbitrary policies accurately, and train new policies that outperform the production policies.
arXiv Detail & Related papers (2021-10-28T04:16:56Z) - An Offline Risk-aware Policy Selection Method for Bayesian Markov
Decision Processes [0.0]
Exploitation vs Caution (EvC) is a paradigm that elegantly incorporates model uncertainty abiding by the Bayesian formalism.
We validate EvC with state-of-the-art approaches in different discrete, yet simple, environments offering a fair variety of MDP classes.
In the tested scenarios EvC manages to select robust policies and hence stands out as a useful tool for practitioners.
arXiv Detail & Related papers (2021-05-27T20:12:20Z) - MUSBO: Model-based Uncertainty Regularized and Sample Efficient Batch
Optimization for Deployment Constrained Reinforcement Learning [108.79676336281211]
Continuous deployment of new policies for data collection and online learning is either cost ineffective or impractical.
We propose a new algorithmic learning framework called Model-based Uncertainty regularized and Sample Efficient Batch Optimization.
Our framework discovers novel and high quality samples for each deployment to enable efficient data collection.
arXiv Detail & Related papers (2021-02-23T01:30:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.