Related papers: Sensitive Information Detection: Recursive Neural Networks for Encoding Context

Sensitive Information Detection: Recursive Neural Networks for Encoding Context

URL: http://arxiv.org/abs/2008.10863v1
Date: Tue, 25 Aug 2020 07:49:46 GMT
Title: Sensitive Information Detection: Recursive Neural Networks for Encoding Context
Authors: Jan Neerbek
Abstract summary: Leak of sensitive information can potentially be very costly. We show that simplistic, brittle rule sets for detecting sensitive information only find a small fraction of the actual sensitive information. We develop a novel family of sensitive information detection approaches which only assumes access to labeled examples.
Score: 0.20305676256390928
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The amount of data for processing and categorization grows at an ever increasing rate. At the same time the demand for collaboration and transparency in organizations, government and businesses, drives the release of data from internal repositories to the public or 3rd party domain. This in turn increase the potential of sharing sensitive information. The leak of sensitive information can potentially be very costly, both financially for organizations, but also for individuals. In this work we address the important problem of sensitive information detection. Specially we focus on detection in unstructured text documents. We show that simplistic, brittle rule sets for detecting sensitive information only find a small fraction of the actual sensitive information. Furthermore we show that previous state-of-the-art approaches have been implicitly tailored to such simplistic scenarios and thus fail to detect actual sensitive content. We develop a novel family of sensitive information detection approaches which only assumes access to labeled examples, rather than unrealistic assumptions such as access to a set of generating rules or descriptive topical seed words. Our approaches are inspired by the current state-of-the-art for paraphrase detection and we adapt deep learning approaches over recursive neural networks to the problem of sensitive information detection. We show that our context-based approaches significantly outperforms the family of previous state-of-the-art approaches for sensitive information detection, so-called keyword-based approaches, on real-world data and with human labeled examples of sensitive and non-sensitive documents.

Related papers

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release. Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z)
Information Leakage of Sentence Embeddings via Generative Embedding Inversion Attacks [1.6427658855248815]
In this study, we reproduce GEIA's findings across various neural sentence embedding models. We propose a simple yet effective method without any modification to the attacker's architecture proposed in GEIA. Our findings indicate that following our approach, an adversary party can recover meaningful sensitive information related to the pre-training knowledge of the popular models used for creating sentence embeddings.
arXiv Detail & Related papers (2025-04-23T10:50:23Z)
Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation [15.355814393928707]
We put forward a unified dataset tailored for social media content moderation across six sensitive categories. These include conflictual language, profanity, sexually explicit material, drug-related content, self-harm, and spam. Fine-tuning large language models on this novel dataset yields significant improvements in detection performance compared to open off-the-shelf models.
arXiv Detail & Related papers (2024-11-29T16:44:02Z)
Protecting Activity Sensing Data Privacy Using Hierarchical Information Dissociation [8.584570228761503]
Smartphones and wearable devices have been integrated into our daily lives, offering personalized services. Many apps become overprivileged as their collected sensing data contains unnecessary sensitive information. Existing methods must obtain private labels and users need to specify privacy policies. We present Hippo to dissociate hierarchical information including private metadata and multi-grained activity information.
arXiv Detail & Related papers (2024-09-04T15:38:00Z)
KiNETGAN: Enabling Distributed Network Intrusion Detection through Knowledge-Infused Synthetic Data Generation [0.0]
We propose a knowledge-infused Generative Adversarial Network for generating synthetic network activity data (KiNETGAN) Our approach enhances the resilience of distributed intrusion detection while addressing privacy concerns.
arXiv Detail & Related papers (2024-05-26T08:02:02Z)
Decouple-and-Sample: Protecting sensitive information in task agnostic data release [17.398889291769986]
sanitizer is a framework for secure and task-agnostic data release. We show that a better privacy-utility trade-off is achieved if sensitive information can be synthesized privately.
arXiv Detail & Related papers (2022-03-17T19:15:33Z)
Reinforcement Learning on Encrypted Data [58.39270571778521]
We present a preliminary, experimental study of how a DQN agent trained on encrypted states performs in environments with discrete and continuous state spaces. Our results highlight that the agent is still capable of learning in small state spaces even in presence of non-deterministic encryption, but performance collapses in more complex environments.
arXiv Detail & Related papers (2021-09-16T21:59:37Z)
Unsupervised Domain Adaption of Object Detectors: A Survey [87.08473838767235]
Recent advances in deep learning have led to the development of accurate and efficient models for various computer vision applications. Learning highly accurate models relies on the availability of datasets with a large number of annotated images. Due to this, model performance drops drastically when evaluated on label-scarce datasets having visually distinct images.
arXiv Detail & Related papers (2021-05-27T23:34:06Z)
DISCO: Dynamic and Invariant Sensitive Channel Obfuscation for deep neural networks [19.307753802569156]
We propose DISCO which learns a dynamic and data driven pruning filter to selectively obfuscate sensitive information in the feature space. We also release an evaluation benchmark dataset of 1 million sensitive representations to encourage rigorous exploration of novel attack schemes.
arXiv Detail & Related papers (2020-12-20T21:15:13Z)
Predicting Themes within Complex Unstructured Texts: A Case Study on Safeguarding Reports [66.39150945184683]
We focus on the problem of automatically identifying the main themes in a safeguarding report using supervised classification approaches. Our results show the potential of deep learning models to simulate subject-expert behaviour even for complex tasks with limited labelled data.
arXiv Detail & Related papers (2020-10-27T19:48:23Z)
Weakly-supervised Salient Instance Detection [65.0408760733005]
We present the first weakly-supervised approach to the salient instance detection problem. We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2020-09-29T09:47:23Z)
Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data. We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z)
Survey of Network Intrusion Detection Methods from the Perspective of the Knowledge Discovery in Databases Process [63.75363908696257]
We review the methods that have been applied to network data with the purpose of developing an intrusion detector. We discuss the techniques used for the capture, preparation and transformation of the data, as well as, the data mining and evaluation methods. As a result of this literature review, we investigate some open issues which will need to be considered for further research in the area of network security.
arXiv Detail & Related papers (2020-01-27T11:21:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.