Related papers: (De)-Indexing and the Right to be Forgotten

(De)-Indexing and the Right to be Forgotten

URL: http://arxiv.org/abs/2501.03989v1
Date: Tue, 07 Jan 2025 18:46:34 GMT
Title: (De)-Indexing and the Right to be Forgotten
Authors: Salvatore Vilella, Giancarlo Ruffo,
Abstract summary: The right to be forgotten (RTBF) allows individuals to request the removal of outdated or harmful information from public access.<n>This paper aims to introduce the concepts of information retrieval (IR) and de-indexing, which are critical for understanding how search engines can effectively "forget" certain content.
Score: 0.11049608786515838
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In the digital age, the challenge of forgetfulness has emerged as a significant concern, particularly regarding the management of personal data and its accessibility online. The right to be forgotten (RTBF) allows individuals to request the removal of outdated or harmful information from public access, yet implementing this right poses substantial technical difficulties for search engines. This paper aims to introduce non-experts to the foundational concepts of information retrieval (IR) and de-indexing, which are critical for understanding how search engines can effectively "forget" certain content. We will explore various IR models, including boolean, probabilistic, vector space, and embedding-based approaches, as well as the role of Large Language Models (LLMs) in enhancing data processing capabilities. By providing this overview, we seek to highlight the complexities involved in balancing individual privacy rights with the operational challenges faced by search engines in managing information visibility.

Related papers

WebSailor: Navigating Super-human Reasoning for Web Agent [72.5231321118689]
WebSailor is a complete post-training methodology designed to instill this crucial capability.<n>Our approach involves generating novel, high-uncertainty tasks through structured sampling and information obfuscation.<n>WebSailor significantly outperforms all opensource agents in complex information-seeking tasks.
arXiv Detail & Related papers (2025-07-03T12:59:07Z)
Bias-Aware Agent: Enhancing Fairness in AI-Driven Knowledge Retrieval [0.0]
This study introduces a novel approach to bias-aware knowledge retrieval by leveraging agentic framework and the innovative use of bias detectors. By empowering users with transparency and awareness, this approach aims to foster more equitable information systems.
arXiv Detail & Related papers (2025-03-27T07:54:39Z)
A Review of the Challenges with Massive Web-mined Corpora Used in Large Language Models Pre-Training [0.0]
This review identifies key challenges in this domain, including challenges such as noise (irrelevant or misleading information), duplication of content, the presence of low-quality or incorrect information, biases, and the inclusion of sensitive or personal information in web-mined corpora. Through an examination of current methodologies for data cleaning, pre-processing, bias detection and mitigation, we highlight the gaps in existing approaches and suggest directions for future research.
arXiv Detail & Related papers (2024-07-10T13:09:23Z)
"Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models [74.05368440735468]
Retrieval-Augmented Generative (RAG) models enhance Large Language Models (LLMs) In this paper, we demonstrate a security threat where adversaries can exploit the openness of these knowledge bases.
arXiv Detail & Related papers (2024-06-26T05:36:23Z)
The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements. LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z)
Machine Unlearning: Taxonomy, Metrics, Applications, Challenges, and Prospects [17.502158848870426]
Data users have been endowed with the right to be forgotten of their data. In the course of machine learning (ML), the forgotten right requires a model provider to delete user data. Machine unlearning emerges to address this, which has garnered ever-increasing attention from both industry and academia.
arXiv Detail & Related papers (2024-03-13T05:11:24Z)
Learning Cross-modality Information Bottleneck Representation for Heterogeneous Person Re-Identification [61.49219876388174]
Visible-Infrared person re-identification (VI-ReID) is an important and challenging task in intelligent video surveillance. Existing methods mainly focus on learning a shared feature space to reduce the modality discrepancy between visible and infrared modalities. We present a novel mutual information and modality consensus network, namely CMInfoNet, to extract modality-invariant identity features.
arXiv Detail & Related papers (2023-08-29T06:55:42Z)
A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems. ML models often remember' the old data. Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z)
Retrieval-Enhanced Machine Learning [110.5237983180089]
We describe a generic retrieval-enhanced machine learning framework, which includes a number of existing models as special cases. REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization. REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.
arXiv Detail & Related papers (2022-05-02T21:42:45Z)
Privacy in Open Search: A Review of Challenges and Solutions [0.6445605125467572]
Information retrieval (IR) is prone to privacy threats, such as attacks and unintended disclosures of documents and search history. This work aims at highlighting and discussing open challenges for privacy in the recent literature of IR, focusing on tasks featuring user-generated text data.
arXiv Detail & Related papers (2021-10-20T18:38:48Z)
Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems. We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z)
Beyond Privacy Trade-offs with Structured Transparency [3.5087540566347513]
We argue that many of these concerns reduce to 'the copy problem' We find that while the copy problem is not solvable, aspects of these amplifying problems have been addressed in a variety of disconnected fields. We propose a five-part framework which groups these efforts into specific capabilities and offers a foundation for their integration into an overarching vision we call "structured transparency"
arXiv Detail & Related papers (2020-12-15T15:03:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.