TempCharBERT: Keystroke Dynamics for Continuous Access Control Based on Pre-trained Language Models
- URL: http://arxiv.org/abs/2411.07224v1
- Date: Mon, 11 Nov 2024 18:44:17 GMT
- Title: TempCharBERT: Keystroke Dynamics for Continuous Access Control Based on Pre-trained Language Models
- Authors: Matheus Simão, Fabiano Prado, Omar Abdul Wahab, Anderson Avila,
- Abstract summary: We propose the use of pre-trained language models (PLMs) to recognize keystroke dynamics.
To overcome this limitation, we propose TempCharBERT, an architecture that incorporates temporal-character information in the embedding layer of CharBERT.
- Score: 0.33748750222488655
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the widespread of digital environments, reliable authentication and continuous access control has become crucial. It can minimize cyber attacks and prevent frauds, specially those associated with identity theft. A particular interest lies on keystroke dynamics (KD), which refers to the task of recognizing individuals' identity based on their unique typing style. In this work, we propose the use of pre-trained language models (PLMs) to recognize such patterns. Although PLMs have shown high performance on multiple NLP benchmarks, the use of these models on specific tasks requires customization. BERT and RoBERTa, for instance, rely on subword tokenization, and they cannot be directly applied to KD, which requires temporal-character information to recognize users. Recent character-aware PLMs are able to process both subwords and character-level information and can be an alternative solution. Notwithstanding, they are still not suitable to be directly fine-tuned for KD as they are not optimized to account for user's temporal typing information (e.g., hold time and flight time). To overcome this limitation, we propose TempCharBERT, an architecture that incorporates temporal-character information in the embedding layer of CharBERT. This allows modeling keystroke dynamics for the purpose of user identification and authentication. Our results show a significant improvement with this customization. We also showed the feasibility of training TempCharBERT on a federated learning settings in order to foster data privacy.
Related papers
- Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning [50.45435841411193]
Code Language Models (CLMs) exhibit unintended memorization of sensitive training data, enabling verbatim reproduction of confidential information when specifically prompted.<n>We introduce CodeEraser, an advanced variant that selectively unlearns sensitive memorized segments in code while preserving the structural integrity and functional correctness of the surrounding code.
arXiv Detail & Related papers (2025-09-17T07:12:35Z) - Evaluating Language Model Reasoning about Confidential Information [95.64687778185703]
We study whether language models exhibit contextual robustness, or the capability to adhere to context-dependent safety specifications.<n>We develop a benchmark (PasswordEval) that measures whether language models can correctly determine when a user request is authorized.<n>We find that current open- and closed-source models struggle with this seemingly simple task, and that, perhaps surprisingly, reasoning capabilities do not generally improve performance.
arXiv Detail & Related papers (2025-08-27T15:39:46Z) - Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences [80.63946798650653]
We explore how users can stay in control of their data by using privacy profiles.<n>We build a framework where a local model uses these instructions to rewrite queries.<n>To support this research, we introduce a multilingual dataset of real user queries to mark private content.
arXiv Detail & Related papers (2025-07-07T18:22:55Z) - POET: Prompt Offset Tuning for Continual Human Action Adaptation [61.63831623094721]
We aim to provide users and developers with the capability to personalize their experience by adding new action classes to their device models continually.
We formalize this as privacy-aware few-shot continual action recognition.
We propose a novel-temporal learnable prompt tuning approach, and are the first to apply such prompt tuning to Graph Neural Networks.
arXiv Detail & Related papers (2025-04-25T04:11:24Z) - Identity Lock: Locking API Fine-tuned LLMs With Identity-based Wake Words [23.466410814073825]
This paper introduces a novel mechanism called identity lock, which restricts the model's core functionality until it is activated by specific identity-based wake words.
We conduct extensive experiments to validate the effectiveness of IdentityLock across a diverse range of datasets spanning various domains.
arXiv Detail & Related papers (2025-03-10T08:59:07Z) - Personalized Language Model Learning on Text Data Without User Identifiers [79.36212347601223]
We propose to let each mobile device maintain a user-specific distribution to dynamically generate user embeddings.
To prevent the cloud from tracking users via uploaded embeddings, the local distributions of different users should either be derived from a linearly dependent space.
Evaluation on both public and industrial datasets reveals a remarkable improvement in accuracy from incorporating anonymous user embeddings.
arXiv Detail & Related papers (2025-01-10T15:46:19Z) - Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering [108.2131720470005]
Large language models (LLMs) have demonstrated remarkable performance across various real-world tasks.
They often struggle to fully comprehend and effectively utilize their input contexts, resulting in responses that are unfaithful or hallucinated.
We propose AutoPASTA, a method that automatically identifies key contextual information and explicitly highlights it by steering an LLM's attention scores.
arXiv Detail & Related papers (2024-09-16T23:52:41Z) - Prompt-Time Ontology-Driven Symbolic Knowledge Capture with Large Language Models [0.0]
This paper explores capturing personal information from user prompts using knowledge-graph approaches.
We use a subset of the KNOW ontology, which models personal information, to train the language model on these concepts.
We then evaluate the success of knowledge capture using a specially constructed dataset.
arXiv Detail & Related papers (2024-05-22T21:40:34Z) - SentinelLMs: Encrypted Input Adaptation and Fine-tuning of Language
Models for Private and Secure Inference [6.0189674528771]
This paper addresses the privacy and security concerns associated with deep neural language models.
Deep neural language models serve as crucial components in various modern AI-based applications.
We propose a novel method to adapt and fine-tune transformer-based language models on passkey-encrypted user-specific text.
arXiv Detail & Related papers (2023-12-28T19:55:11Z) - Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs [80.48606583629123]
PASTA is a method that allows large language models to read text with user-specified emphasis marks.
It can substantially enhance an LLM's ability to follow user instructions or integrate new knowledge from user inputs.
arXiv Detail & Related papers (2023-11-03T22:56:43Z) - Improving Input-label Mapping with Demonstration Replay for In-context
Learning [67.57288926736923]
In-context learning (ICL) is an emerging capability of large autoregressive language models.
We propose a novel ICL method called Sliding Causal Attention (RdSca)
We show that our method significantly improves the input-label mapping in ICL demonstrations.
arXiv Detail & Related papers (2023-10-30T14:29:41Z) - FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
Click-through rate (CTR) prediction plays as a core function module in personalized online services.
Traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality.
Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality.
We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction.
arXiv Detail & Related papers (2023-10-30T11:25:03Z) - Context-Aware Differential Privacy for Language Modeling [41.54238543400462]
This paper introduces Context-Aware Differentially Private Language Model (CADP-LM)
CADP-LM relies on the notion of emphcontext to define and audit the potentially sensitive information.
A unique characteristic of CADP-LM is its ability to target the protection of sensitive sentences and contexts only.
arXiv Detail & Related papers (2023-01-28T20:06:16Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - UserBERT: Modeling Long- and Short-Term User Preferences via
Self-Supervision [6.8904125699168075]
This paper extends the BERT model to e-commerce user data for pre-training representations in a self-supervised manner.
By viewing user actions in sequences as analogous to words in sentences, we extend the existing BERT model to user behavior data.
We propose methods for the tokenization of different types of user behavior sequences, the generation of input representation, and a novel pretext task to enable the pre-trained model to learn from its own input.
arXiv Detail & Related papers (2022-02-14T08:31:36Z) - Attribute Inference Attack of Speech Emotion Recognition in Federated
Learning Settings [56.93025161787725]
Federated learning (FL) is a distributed machine learning paradigm that coordinates clients to train a model collaboratively without sharing local data.
We propose an attribute inference attack framework that infers sensitive attribute information of the clients from shared gradients or model parameters.
We show that the attribute inference attack is achievable for SER systems trained using FL.
arXiv Detail & Related papers (2021-12-26T16:50:42Z) - Time-Aware Language Models as Temporal Knowledge Bases [39.00042720454899]
Language models (LMs) are trained on snapshots of data collected at a specific moment in time.
We introduce a diagnostic dataset aimed at probing LMs for factual knowledge that changes over time.
We propose a simple technique for jointly modeling text with its timestamp.
arXiv Detail & Related papers (2021-06-29T06:18:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.