Sensitive Information Detection: Recursive Neural Networks for Encoding
Context
- URL: http://arxiv.org/abs/2008.10863v1
- Date: Tue, 25 Aug 2020 07:49:46 GMT
- Title: Sensitive Information Detection: Recursive Neural Networks for Encoding
Context
- Authors: Jan Neerbek
- Abstract summary: Leak of sensitive information can potentially be very costly.
We show that simplistic, brittle rule sets for detecting sensitive information only find a small fraction of the actual sensitive information.
We develop a novel family of sensitive information detection approaches which only assumes access to labeled examples.
- Score: 0.20305676256390928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The amount of data for processing and categorization grows at an ever
increasing rate. At the same time the demand for collaboration and transparency
in organizations, government and businesses, drives the release of data from
internal repositories to the public or 3rd party domain. This in turn increase
the potential of sharing sensitive information. The leak of sensitive
information can potentially be very costly, both financially for organizations,
but also for individuals. In this work we address the important problem of
sensitive information detection. Specially we focus on detection in
unstructured text documents.
We show that simplistic, brittle rule sets for detecting sensitive
information only find a small fraction of the actual sensitive information.
Furthermore we show that previous state-of-the-art approaches have been
implicitly tailored to such simplistic scenarios and thus fail to detect actual
sensitive content. We develop a novel family of sensitive information detection
approaches which only assumes access to labeled examples, rather than
unrealistic assumptions such as access to a set of generating rules or
descriptive topical seed words. Our approaches are inspired by the current
state-of-the-art for paraphrase detection and we adapt deep learning approaches
over recursive neural networks to the problem of sensitive information
detection. We show that our context-based approaches significantly outperforms
the family of previous state-of-the-art approaches for sensitive information
detection, so-called keyword-based approaches, on real-world data and with
human labeled examples of sensitive and non-sensitive documents.
Related papers
- Protecting Activity Sensing Data Privacy Using Hierarchical Information Dissociation [8.584570228761503]
Smartphones and wearable devices have been integrated into our daily lives, offering personalized services.
Many apps become overprivileged as their collected sensing data contains unnecessary sensitive information.
Existing methods must obtain private labels and users need to specify privacy policies.
We present Hippo to dissociate hierarchical information including private metadata and multi-grained activity information.
arXiv Detail & Related papers (2024-09-04T15:38:00Z) - KiNETGAN: Enabling Distributed Network Intrusion Detection through Knowledge-Infused Synthetic Data Generation [0.0]
We propose a knowledge-infused Generative Adversarial Network for generating synthetic network activity data (KiNETGAN)
Our approach enhances the resilience of distributed intrusion detection while addressing privacy concerns.
arXiv Detail & Related papers (2024-05-26T08:02:02Z) - Decouple-and-Sample: Protecting sensitive information in task agnostic
data release [17.398889291769986]
sanitizer is a framework for secure and task-agnostic data release.
We show that a better privacy-utility trade-off is achieved if sensitive information can be synthesized privately.
arXiv Detail & Related papers (2022-03-17T19:15:33Z) - Reinforcement Learning on Encrypted Data [58.39270571778521]
We present a preliminary, experimental study of how a DQN agent trained on encrypted states performs in environments with discrete and continuous state spaces.
Our results highlight that the agent is still capable of learning in small state spaces even in presence of non-deterministic encryption, but performance collapses in more complex environments.
arXiv Detail & Related papers (2021-09-16T21:59:37Z) - Unsupervised Domain Adaption of Object Detectors: A Survey [87.08473838767235]
Recent advances in deep learning have led to the development of accurate and efficient models for various computer vision applications.
Learning highly accurate models relies on the availability of datasets with a large number of annotated images.
Due to this, model performance drops drastically when evaluated on label-scarce datasets having visually distinct images.
arXiv Detail & Related papers (2021-05-27T23:34:06Z) - DISCO: Dynamic and Invariant Sensitive Channel Obfuscation for deep
neural networks [19.307753802569156]
We propose DISCO which learns a dynamic and data driven pruning filter to selectively obfuscate sensitive information in the feature space.
We also release an evaluation benchmark dataset of 1 million sensitive representations to encourage rigorous exploration of novel attack schemes.
arXiv Detail & Related papers (2020-12-20T21:15:13Z) - Predicting Themes within Complex Unstructured Texts: A Case Study on
Safeguarding Reports [66.39150945184683]
We focus on the problem of automatically identifying the main themes in a safeguarding report using supervised classification approaches.
Our results show the potential of deep learning models to simulate subject-expert behaviour even for complex tasks with limited labelled data.
arXiv Detail & Related papers (2020-10-27T19:48:23Z) - Weakly-supervised Salient Instance Detection [65.0408760733005]
We present the first weakly-supervised approach to the salient instance detection problem.
We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2020-09-29T09:47:23Z) - Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data.
We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z) - Survey of Network Intrusion Detection Methods from the Perspective of
the Knowledge Discovery in Databases Process [63.75363908696257]
We review the methods that have been applied to network data with the purpose of developing an intrusion detector.
We discuss the techniques used for the capture, preparation and transformation of the data, as well as, the data mining and evaluation methods.
As a result of this literature review, we investigate some open issues which will need to be considered for further research in the area of network security.
arXiv Detail & Related papers (2020-01-27T11:21:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.