What Data is Really Necessary? A Feasibility Study of Inference Data Minimization for Recommender Systems
- URL: http://arxiv.org/abs/2508.21547v1
- Date: Fri, 29 Aug 2025 12:01:17 GMT
- Title: What Data is Really Necessary? A Feasibility Study of Inference Data Minimization for Recommender Systems
- Authors: Jens Leysen, Marco Favier, Bart Goethals,
- Abstract summary: This paper conducts a feasibility study on minimizing implicit feedback inference data for recommender systems.<n>We demonstrate that substantial inference data reduction is technically feasible without significant performance loss.<n>While we establish its technical feasibility, we conclude that data minimization remains practically challenging.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data minimization is a legal principle requiring personal data processing to be limited to what is necessary for a specified purpose. Operationalizing this principle for recommender systems, which rely on extensive personal data, remains a significant challenge. This paper conducts a feasibility study on minimizing implicit feedback inference data for such systems. We propose a novel problem formulation, analyze various minimization techniques, and investigate key factors influencing their effectiveness. We demonstrate that substantial inference data reduction is technically feasible without significant performance loss. However, its practicality is critically determined by two factors: the technical setting (e.g., performance targets, choice of model) and user characteristics (e.g., history size, preference complexity). Thus, while we establish its technical feasibility, we conclude that data minimization remains practically challenging and its dependence on the technical and user context makes a universal standard for data `necessity' difficult to implement.
Related papers
- SoK: Data Minimization in Machine Learning [49.60064304454055]
Data minimization (DM) describes the principle of collecting only the data strictly necessary for a given task.<n>The relevance of data minimization is particularly pronounced in machine learning (ML) applications.<n>Existing work on other ML privacy and security topics often addresses concerns relevant to DMML without explicitly acknowledging the connection.<n>This work introduces a comprehensive framework for DMML, including a unified data pipeline, adversaries, and points of minimization.
arXiv Detail & Related papers (2025-08-14T17:00:13Z) - Algorithmic Data Minimization for Machine Learning over Internet-of-Things Data Streams [10.61303879393919]
Machine learning can analyze vast amounts of data generated by IoT devices to identify patterns, make predictions, and enable real-time decision-making.<n> IoT systems are often deployed in sensitive environments such as households and offices, where they may inadvertently expose identifiable information.<n>This paper provides a technical interpretation of data minimization in the context of sensor streams, explores practical methods for implementation, and addresses the challenges involved.
arXiv Detail & Related papers (2025-03-07T18:35:11Z) - The Data Minimization Principle in Machine Learning [61.17813282782266]
Data minimization aims to reduce the amount of data collected, processed or retained.
It has been endorsed by various global data protection regulations.
However, its practical implementation remains a challenge due to the lack of a rigorous formulation.
arXiv Detail & Related papers (2024-05-29T19:40:27Z) - How Much More Data Do I Need? Estimating Requirements for Downstream
Tasks [99.44608160188905]
Given a small training data set and a learning algorithm, how much more data is necessary to reach a target validation or test performance?
Overestimating or underestimating data requirements incurs substantial costs that could be avoided with an adequate budget.
Using our guidelines, practitioners can accurately estimate data requirements of machine learning systems to gain savings in both development time and data acquisition costs.
arXiv Detail & Related papers (2022-07-04T21:16:05Z) - Learning to Limit Data Collection via Scaling Laws: Data Minimization
Compliance in Practice [62.44110411199835]
We build on literature in machine learning law to propose framework for limiting collection based on data interpretation that ties data to system performance.
We formalize a data minimization criterion based on performance curve derivatives and provide an effective and interpretable piecewise power law technique.
arXiv Detail & Related papers (2021-07-16T19:59:01Z) - Counterfactual Explanations as Interventions in Latent Space [62.997667081978825]
Counterfactual explanations aim to provide to end users a set of features that need to be changed in order to achieve a desired outcome.
Current approaches rarely take into account the feasibility of actions needed to achieve the proposed explanations.
We present Counterfactual Explanations as Interventions in Latent Space (CEILS), a methodology to generate counterfactual explanations.
arXiv Detail & Related papers (2021-06-14T20:48:48Z) - Reviving Purpose Limitation and Data Minimisation in Personalisation,
Profiling and Decision-Making Systems [0.0]
This paper determines, through an interdisciplinary law and computer science lens, whether data minimisation and purpose limitation can be meaningfully implemented in data-driven systems.
Our analysis reveals that the two legal principles continue to play an important role in mitigating the risks of personal data processing.
We highlight that even though these principles are important safeguards in the systems under consideration, there are important limits to their practical implementation.
arXiv Detail & Related papers (2021-01-15T16:36:29Z) - Operationalizing the Legal Principle of Data Minimization for
Personalization [64.0027026050706]
We identify a lack of a homogeneous interpretation of the data minimization principle and explore two operational definitions applicable in the context of personalization.
We find that the performance decrease incurred by data minimization might not be substantial, but it might disparately impact different users.
arXiv Detail & Related papers (2020-05-28T00:43:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.