(De)-Indexing and the Right to be Forgotten
- URL: http://arxiv.org/abs/2501.03989v1
- Date: Tue, 07 Jan 2025 18:46:34 GMT
- Title: (De)-Indexing and the Right to be Forgotten
- Authors: Salvatore Vilella, Giancarlo Ruffo,
- Abstract summary: The right to be forgotten (RTBF) allows individuals to request the removal of outdated or harmful information from public access.
This paper aims to introduce the concepts of information retrieval (IR) and de-indexing, which are critical for understanding how search engines can effectively "forget" certain content.
- Score: 0.11049608786515838
- License:
- Abstract: In the digital age, the challenge of forgetfulness has emerged as a significant concern, particularly regarding the management of personal data and its accessibility online. The right to be forgotten (RTBF) allows individuals to request the removal of outdated or harmful information from public access, yet implementing this right poses substantial technical difficulties for search engines. This paper aims to introduce non-experts to the foundational concepts of information retrieval (IR) and de-indexing, which are critical for understanding how search engines can effectively "forget" certain content. We will explore various IR models, including boolean, probabilistic, vector space, and embedding-based approaches, as well as the role of Large Language Models (LLMs) in enhancing data processing capabilities. By providing this overview, we seek to highlight the complexities involved in balancing individual privacy rights with the operational challenges faced by search engines in managing information visibility.
Related papers
- Secure Visual Data Processing via Federated Learning [2.4374097382908477]
This paper addresses the need for privacy-preserving solutions in large-scale visual data processing.
We propose a new approach that combines object detection, federated learning and anonymization.
Our solution is evaluated against traditional centralized models, showing that while there is a slight trade-off in accuracy, the privacy benefits are substantial.
arXiv Detail & Related papers (2025-02-09T09:44:18Z) - A Review of the Challenges with Massive Web-mined Corpora Used in Large Language Models Pre-Training [0.0]
This review identifies key challenges in this domain, including challenges such as noise (irrelevant or misleading information), duplication of content, the presence of low-quality or incorrect information, biases, and the inclusion of sensitive or personal information in web-mined corpora.
Through an examination of current methodologies for data cleaning, pre-processing, bias detection and mitigation, we highlight the gaps in existing approaches and suggest directions for future research.
arXiv Detail & Related papers (2024-07-10T13:09:23Z) - "Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models [74.05368440735468]
Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs)
In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
arXiv Detail & Related papers (2024-06-26T05:36:23Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Machine Unlearning: Taxonomy, Metrics, Applications, Challenges, and
Prospects [17.502158848870426]
Data users have been endowed with the right to be forgotten of their data.
In the course of machine learning (ML), the forgotten right requires a model provider to delete user data.
Machine unlearning emerges to address this, which has garnered ever-increasing attention from both industry and academia.
arXiv Detail & Related papers (2024-03-13T05:11:24Z) - Learning Cross-modality Information Bottleneck Representation for
Heterogeneous Person Re-Identification [61.49219876388174]
Visible-Infrared person re-identification (VI-ReID) is an important and challenging task in intelligent video surveillance.
Existing methods mainly focus on learning a shared feature space to reduce the modality discrepancy between visible and infrared modalities.
We present a novel mutual information and modality consensus network, namely CMInfoNet, to extract modality-invariant identity features.
arXiv Detail & Related papers (2023-08-29T06:55:42Z) - A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems.
ML models often remember' the old data.
Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z) - Retrieval-Enhanced Machine Learning [110.5237983180089]
We describe a generic retrieval-enhanced machine learning framework, which includes a number of existing models as special cases.
REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization.
REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.
arXiv Detail & Related papers (2022-05-02T21:42:45Z) - Privacy in Open Search: A Review of Challenges and Solutions [0.6445605125467572]
Information retrieval (IR) is prone to privacy threats, such as attacks and unintended disclosures of documents and search history.
This work aims at highlighting and discussing open challenges for privacy in the recent literature of IR, focusing on tasks featuring user-generated text data.
arXiv Detail & Related papers (2021-10-20T18:38:48Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.