Related papers: Are LLM-based methods good enough for detecting unfair terms of service?

Are LLM-based methods good enough for detecting unfair terms of service?

URL: http://arxiv.org/abs/2409.00077v2
Date: Fri, 6 Sep 2024 16:12:00 GMT
Title: Are LLM-based methods good enough for detecting unfair terms of service?
Authors: Mirgita Frasheri, Arian Bakhtiarnia, Lukas Esterle, Alexandros Iosifidis,
Abstract summary: Large language models (LLMs) are good at parsing long text-based documents. We build a dataset consisting of 12 questions applied individually to a set of privacy policies. Some open-source models are able to provide a higher accuracy compared to some commercial models.
Score: 67.49487557224415
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Countless terms of service (ToS) are being signed everyday by users all over the world while interacting with all kinds of apps and websites. More often than not, these online contracts spanning double-digit pages are signed blindly by users who simply want immediate access to the desired service. What would normally require a consultation with a legal team, has now become a mundane activity consisting of a few clicks where users potentially sign away their rights, for instance in terms of their data privacy, to countless online entities/companies. Large language models (LLMs) are good at parsing long text-based documents, and could potentially be adopted to help users when dealing with dubious clauses in ToS and their underlying privacy policies. To investigate the utility of existing models for this task, we first build a dataset consisting of 12 questions applied individually to a set of privacy policies crawled from popular websites. Thereafter, a series of open-source as well as commercial chatbots such as ChatGPT, are queried over each question, with the answers being compared to a given ground truth. Our results show that some open-source models are able to provide a higher accuracy compared to some commercial models. However, the best performance is recorded from a commercial chatbot (ChatGPT4). Overall, all models perform only slightly better than random at this task. Consequently, their performance needs to be significantly improved before they can be adopted at large for this purpose.

Related papers

Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences [80.63946798650653]
We explore how users can stay in control of their data by using privacy profiles.<n>We build a framework where a local model uses these instructions to rewrite queries.<n>To support this research, we introduce a multilingual dataset of real user queries to mark private content.
arXiv Detail & Related papers (2025-07-07T18:22:55Z)
MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation [54.410825977390274]
Existing benchmarks to evaluate contextual privacy in LLM-agents primarily assess single-turn, low-complexity tasks.<n>We first present a benchmark - MAGPIE comprising 158 real-life high-stakes scenarios across 15 domains.<n>We then evaluate the current state-of-the-art LLMs on their understanding of contextually private data and their ability to collaborate without violating user privacy.
arXiv Detail & Related papers (2025-06-25T18:04:25Z)
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale [51.9706400130481]
Large Language Models (LLMs) have emerged as personalized assistants for users across a wide range of tasks. PERSONAMEM features curated user profiles with over 180 simulated user-LLM interaction histories. We evaluate LLM chatbots' ability to identify the most suitable response according to the current state of the user's profile.
arXiv Detail & Related papers (2025-04-19T08:16:10Z)
Label Privacy in Split Learning for Large Models with Parameter-Efficient Training [51.28799334394279]
We search for a way to fine-tune models over an API while keeping the labels private. We propose P$3$EFT, a multi-party split learning algorithm that takes advantage of existing PEFT properties to maintain privacy at a lower performance overhead.
arXiv Detail & Related papers (2024-12-21T15:32:03Z)
PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles [21.340456482528136]
We propose Privacy-Conscious Delegation, a novel task for chaining API-based and local models. We utilize recent public collections of user-LLM interactions to construct a natural benchmark called PUPA. Our best pipeline maintains high response quality for 85.5% of user queries while restricting privacy leakage to only 7.5%.
arXiv Detail & Related papers (2024-10-22T16:00:26Z)
Entailment-Driven Privacy Policy Classification with LLMs [3.564208334473993]
We propose a framework to classify paragraphs of privacy policies into meaningful labels that are easily understood by users. Our framework improves the F1 score in average by 11.2%.
arXiv Detail & Related papers (2024-09-25T05:07:05Z)
WildChat: 1M ChatGPT Interaction Logs in the Wild [88.05964311416717]
WildChat is a corpus of 1 million user-ChatGPT conversations, which consists of over 2.5 million interaction turns. In addition to timestamped chat transcripts, we enrich the dataset with demographic data, including state, country, and hashed IP addresses.
arXiv Detail & Related papers (2024-05-02T17:00:02Z)
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference [48.99117537559644]
We introduce Arena, an open platform for evaluating Large Language Models (LLMs) based on human preferences. Our methodology employs a pairwise comparison approach and leverages input from a diverse user base through crowdsourcing. This paper describes the platform, analyzes the data we have collected so far, and explains the tried-and-true statistical methods we are using.
arXiv Detail & Related papers (2024-03-07T01:22:38Z)
Differentially Private Synthetic Data via Foundation Model APIs 2: Text [56.13240830670327]
A lot of high-quality text data generated in the real world is private and cannot be shared or used freely due to privacy concerns. We propose an augmented PE algorithm, named Aug-PE, that applies to the complex setting of text. Our results demonstrate that Aug-PE produces DP synthetic text that yields competitive utility with the SOTA DP finetuning baselines.
arXiv Detail & Related papers (2024-03-04T05:57:50Z)
PolicyGPT: Automated Analysis of Privacy Policies with Large Language Models [41.969546784168905]
In practical use, users tend to click the Agree button directly rather than reading them carefully. This practice exposes users to risks of privacy leakage and legal issues. Recently, the advent of Large Language Models (LLM) such as ChatGPT and GPT-4 has opened new possibilities for text analysis.
arXiv Detail & Related papers (2023-09-19T01:22:42Z)
Protecting User Privacy in Online Settings via Supervised Learning [69.38374877559423]
We design an intelligent approach to online privacy protection that leverages supervised learning. By detecting and blocking data collection that might infringe on a user's privacy, we can restore a degree of digital privacy to the user.
arXiv Detail & Related papers (2023-04-06T05:20:16Z)
FedBot: Enhancing Privacy in Chatbots with Federated Learning [0.0]
Federated Learning (FL) aims to protect data privacy through distributed learning methods that keep the data in its location. The POC combines Deep Bidirectional Transformer models and federated learning algorithms to protect customer data privacy during collaborative model training. The system is specifically designed to improve its performance and accuracy over time by leveraging its ability to learn from previous interactions.
arXiv Detail & Related papers (2023-04-04T23:13:52Z)
Compliance Checking with NLI: Privacy Policies vs. Regulations [0.0]
We use Natural Language Inference techniques to compare privacy regulations against sections of privacy policies from a selection of large companies. Our model uses pre-trained embeddings, along with BiLSTM in its attention mechanism.
arXiv Detail & Related papers (2022-03-01T17:27:16Z)
Mining Implicit Relevance Feedback from User Behavior for Web Question Answering [92.45607094299181]
We make the first study to explore the correlation between user behavior and passage relevance. Our approach significantly improves the accuracy of passage ranking without extra human labeled data. In practice, this work has proved effective to substantially reduce the human labeling cost for the QA service in a global commercial search engine.
arXiv Detail & Related papers (2020-06-13T07:02:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.