Comparing Feature-based and Context-aware Approaches to PII Generalization Level Prediction
- URL: http://arxiv.org/abs/2407.02837v1
- Date: Wed, 3 Jul 2024 06:32:03 GMT
- Title: Comparing Feature-based and Context-aware Approaches to PII Generalization Level Prediction
- Authors: Kailin Zhang, Xinying Qiu,
- Abstract summary: PII in text data is crucial for privacy, but current generalization methods face challenges such as uneven data distributions and limited context awareness.
We propose two approaches: a feature-based method using machine learning to improve performance on structured inputs, and a novel context-aware framework that considers the broader context and semantic relationships between the original text and generalized candidates.
Experiments on the WikiReplace dataset demonstrate the effectiveness of both methods, with the context-aware approach outperforming the feature-based one across different scales.
- Score: 0.6138671548064356
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Protecting Personal Identifiable Information (PII) in text data is crucial for privacy, but current PII generalization methods face challenges such as uneven data distributions and limited context awareness. To address these issues, we propose two approaches: a feature-based method using machine learning to improve performance on structured inputs, and a novel context-aware framework that considers the broader context and semantic relationships between the original text and generalized candidates. The context-aware approach employs Multilingual-BERT for text representation, functional transformations, and mean squared error scoring to evaluate candidates. Experiments on the WikiReplace dataset demonstrate the effectiveness of both methods, with the context-aware approach outperforming the feature-based one across different scales. This work contributes to advancing PII generalization techniques by highlighting the importance of feature selection, ensemble learning, and incorporating contextual information for better privacy protection in text anonymization.
Related papers
- How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - CLIP-based Synergistic Knowledge Transfer for Text-based Person
Retrieval [66.93563107820687]
We introduce a CLIP-based Synergistic Knowledge Transfer (CSKT) approach for Person Retrieval (TPR)
To explore the CLIP's knowledge on input side, we first propose a Bidirectional Prompts Transferring (BPT) module constructed by text-to-image and image-to-text bidirectional prompts and coupling projections.
CSKT outperforms the state-of-the-art approaches across three benchmark datasets when the training parameters merely account for 7.4% of the entire model.
arXiv Detail & Related papers (2023-09-18T05:38:49Z) - Improving Mandarin Prosodic Structure Prediction with Multi-level
Contextual Information [68.89000132126536]
This work proposes to use inter-utterance linguistic information to improve the performance of prosodic structure prediction (PSP)
Our method achieves better F1 scores in predicting prosodic word (PW), prosodic phrase (PPH) and intonational phrase (IPH)
arXiv Detail & Related papers (2023-08-31T09:19:15Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - From Contextual Data to Newsvendor Decisions: On the Actual Performance
of Data-Driven Algorithms [2.9603743540540357]
We study how the relevance and quantity of past data affects the performance of a data-driven policy.
We consider a setting in which past demands observed under close by'' contexts come from close by distributions.
arXiv Detail & Related papers (2023-02-16T17:03:39Z) - Semantic Interactive Learning for Text Classification: A Constructive
Approach for Contextual Interactions [0.0]
We propose a novel interaction framework called Semantic Interactive Learning for the text domain.
We frame the problem of incorporating constructive and contextual feedback into the learner as a task to find an architecture that enables more semantic alignment between humans and machines.
We introduce a technique called SemanticPush that is effective for translating conceptual corrections of humans to non-extrapolating training examples.
arXiv Detail & Related papers (2022-09-07T08:13:45Z) - Semantic Role Aware Correlation Transformer for Text to Video Retrieval [23.183653281610866]
This paper proposes a novel transformer that explicitly disentangles the text and video into semantic roles of objects, spatial contexts and temporal contexts.
Preliminary results on popular YouCook2 indicate that our approach surpasses a current state-of-the-art method, with a high margin in all metrics.
arXiv Detail & Related papers (2022-06-26T11:28:03Z) - Semantics-Preserved Distortion for Personal Privacy Protection in Information Management [65.08939490413037]
This paper suggests a linguistically-grounded approach to distort texts while maintaining semantic integrity.
We present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach.
We also explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization.
arXiv Detail & Related papers (2022-01-04T04:01:05Z) - CRIS: CLIP-Driven Referring Image Segmentation [71.56466057776086]
We propose an end-to-end CLIP-Driven Referring Image framework (CRIS)
CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment.
Our proposed framework significantly outperforms the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-11-30T07:29:08Z) - Matching Text with Deep Mutual Information Estimation [0.0]
We present a neural approach for general-purpose text matching with deep mutual information estimation incorporated.
Our approach, Text matching with Deep Info Max (TIM), is integrated with a procedure of unsupervised learning of representations.
We evaluate our text matching approach on several tasks including natural language inference, paraphrase identification, and answer selection.
arXiv Detail & Related papers (2020-03-09T15:25:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.