Cross-Policy Compliance Detection via Question Answering
- URL: http://arxiv.org/abs/2109.03731v1
- Date: Wed, 8 Sep 2021 15:47:41 GMT
- Title: Cross-Policy Compliance Detection via Question Answering
- Authors: Marzieh Saeidi, Majid Yazdani, Andreas Vlachos
- Abstract summary: We propose to address policy compliance detection via decomposing it into question answering.
We demonstrate that this approach results in better accuracy, especially in the cross-policy setup.
It explicitly identifies the information missing from a scenario in case policy compliance cannot be determined.
- Score: 13.373804837863155
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Policy compliance detection is the task of ensuring that a scenario conforms
to a policy (e.g. a claim is valid according to government rules or a post in
an online platform conforms to community guidelines). This task has been
previously instantiated as a form of textual entailment, which results in poor
accuracy due to the complexity of the policies. In this paper we propose to
address policy compliance detection via decomposing it into question answering,
where questions check whether the conditions stated in the policy apply to the
scenario, and an expression tree combines the answers to obtain the label.
Despite the initial upfront annotation cost, we demonstrate that this approach
results in better accuracy, especially in the cross-policy setup where the
policies during testing are unseen in training. In addition, it allows us to
use existing question answering models pre-trained on existing large datasets.
Finally, it explicitly identifies the information missing from a scenario in
case policy compliance cannot be determined. We conduct our experiments using a
recent dataset consisting of government policies, which we augment with expert
annotations and find that the cost of annotating question answering
decomposition is largely offset by improved inter-annotator agreement and
speed.
Related papers
- Statistical Analysis of Policy Space Compression Problem [54.1754937830779]
Policy search methods are crucial in reinforcement learning, offering a framework to address continuous state-action and partially observable problems.
Reducing the policy space through policy compression emerges as a powerful, reward-free approach to accelerate the learning process.
This technique condenses the policy space into a smaller, representative set while maintaining most of the original effectiveness.
arXiv Detail & Related papers (2024-11-15T02:46:55Z) - Information Capacity Regret Bounds for Bandits with Mediator Feedback [55.269551124587224]
We introduce the policy set capacity as an information-theoretic measure for the complexity of the policy set.
Adopting the classical EXP4 algorithm, we provide new regret bounds depending on the policy set capacity.
For a selection of policy set families, we prove nearly-matching lower bounds, scaling similarly with the capacity.
arXiv Detail & Related papers (2024-02-15T19:18:47Z) - Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies.
Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z) - Conformal Off-Policy Evaluation in Markov Decision Processes [53.786439742572995]
Reinforcement Learning aims at identifying and evaluating efficient control policies from data.
Most methods for this learning task, referred to as Off-Policy Evaluation (OPE), do not come with accuracy and certainty guarantees.
We present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty.
arXiv Detail & Related papers (2023-04-05T16:45:11Z) - Counterfactual Learning with General Data-generating Policies [3.441021278275805]
We develop an OPE method for a class of full support and deficient support logging policies in contextual-bandit settings.
We prove that our method's prediction converges in probability to the true performance of a counterfactual policy as the sample size increases.
arXiv Detail & Related papers (2022-12-04T21:07:46Z) - Distributionally Robust Batch Contextual Bandits [20.667213458836734]
Policy learning using historical observational data is an important problem that has found widespread applications.
Existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment.
In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data.
arXiv Detail & Related papers (2020-06-10T03:11:40Z) - Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic
Policies [80.42316902296832]
We study the estimation of policy value and gradient of a deterministic policy from off-policy data when actions are continuous.
In this setting, standard importance sampling and doubly robust estimators for policy value and gradient fail because the density ratio does not exist.
We propose several new doubly robust estimators based on different kernelization approaches.
arXiv Detail & Related papers (2020-06-06T15:52:05Z) - Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement
Learning [70.01650994156797]
Off- evaluation of sequential decision policies from observational data is necessary in batch reinforcement learning such as education healthcare.
We develop an approach that estimates the bounds of a given policy.
We prove convergence to the sharp bounds as we collect more confounded data.
arXiv Detail & Related papers (2020-02-11T16:18:14Z) - Fast Compliance Checking with General Vocabularies [0.0]
We introduce an OWL2 profile for representing data protection policies.
With this language, a company's data usage policy can be checked for compliance with data subjects' consent.
We exploit IBQ reasoning to integrate specialized reasoners for the policy language and the vocabulary's language.
arXiv Detail & Related papers (2020-01-16T09:08:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.