Missing Data in Signal Processing and Machine Learning: Models, Methods and Modern Approaches
- URL: http://arxiv.org/abs/2506.01696v2
- Date: Tue, 03 Jun 2025 16:12:45 GMT
- Title: Missing Data in Signal Processing and Machine Learning: Models, Methods and Modern Approaches
- Authors: Alexandre Hippert-Ferrer, Aude Sportisse, Amirhossein Javaheri, Mohammed Nabil El Korso, Daniel P. Palomar,
- Abstract summary: This tutorial aims to provide signal processing (SP) and machine learning (ML) practitioners with vital tools to answer the question: How to deal with missing data?
- Score: 49.431846265898486
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This tutorial aims to provide signal processing (SP) and machine learning (ML) practitioners with vital tools, in an accessible way, to answer the question: How to deal with missing data? There are many strategies to handle incomplete signals. In this paper, we propose to group these strategies based on three common tasks: i) missing-data imputation, ii) estimation with missing values and iii) prediction with missing values. We focus on methodological and experimental results through specific case studies on real-world applications. Promising and future research directions, including a better integration of informative missingness, are also discussed. We hope that the proposed conceptual framework and the presentation of recent missing-data problems related will encourage researchers of the SP and ML communities to develop original methods and to efficiently deal with new applications involving missing data.
Related papers
- Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs [58.24692529185971]
We introduce a comprehensive auditing framework for unlearning evaluation comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods.<n>We evaluate the effectiveness and robustness of different unlearning strategies.
arXiv Detail & Related papers (2025-05-29T09:19:07Z) - Impact of Missing Values in Machine Learning: A Comprehensive Analysis [0.0]
This paper aims to examine the nuanced impact of missing values on machine learning (ML) models.
Our analysis focuses on the challenges posed by missing values, including biased inferences, reduced predictive power, and increased computational burdens.
The study employs case studies and real-world examples to illustrate the practical implications of addressing missing values.
arXiv Detail & Related papers (2024-10-10T18:31:44Z) - Towards Better Modeling with Missing Data: A Contrastive Learning-based
Visual Analytics Perspective [7.577040836988683]
Missing data can pose a challenge for machine learning (ML) modeling.
Current approaches are categorized into feature imputation and label prediction.
This study proposes a Contrastive Learning framework to model observed data with missing values.
arXiv Detail & Related papers (2023-09-18T13:16:24Z) - Learn to Unlearn: A Survey on Machine Unlearning [29.077334665555316]
This article presents a review of recent machine unlearning techniques, verification mechanisms, and potential attacks.
We highlight emerging challenges and prospective research directions.
We aim for this paper to provide valuable resources for integrating privacy, equity, andresilience into ML systems.
arXiv Detail & Related papers (2023-05-12T14:28:02Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Deeply-Learned Generalized Linear Models with Missing Data [6.302686933168439]
We provide a formal treatment of missing data in the context of deeply learned generalized linear models.
We propose a new architecture, textitdlglm, that is able to flexibly account for both ignorable and non-ignorable patterns of missingness.
We conclude with a case study of a Bank Marketing dataset from the UCI Machine Learning Repository.
arXiv Detail & Related papers (2022-07-18T20:00:13Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - S^3-Rec: Self-Supervised Learning for Sequential Recommendation with
Mutual Information Maximization [104.87483578308526]
We propose the model S3-Rec, which stands for Self-Supervised learning for Sequential Recommendation.
For our task, we devise four auxiliary self-supervised objectives to learn the correlations among attribute, item, subsequence, and sequence.
Extensive experiments conducted on six real-world datasets demonstrate the superiority of our proposed method over existing state-of-the-art methods.
arXiv Detail & Related papers (2020-08-18T11:44:10Z) - A Survey on Large-scale Machine Learning [67.6997613600942]
Machine learning can provide deep insights into data, allowing machines to make high-quality predictions.
Most sophisticated machine learning approaches suffer from huge time costs when operating on large-scale data.
Large-scale Machine Learning aims to learn patterns from big data with comparable performance efficiently.
arXiv Detail & Related papers (2020-08-10T06:07:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.