Robust Data Clustering with Outliers via Transformed Tensor Low-Rank Representation
- URL: http://arxiv.org/abs/2307.09055v3
- Date: Fri, 26 Apr 2024 01:29:46 GMT
- Title: Robust Data Clustering with Outliers via Transformed Tensor Low-Rank Representation
- Authors: Tong Wu,
- Abstract summary: tensor low-rank representation (TLRR) has become a popular tool for tensor data recovery and clustering.
This paper develops an outlier-robust tensor low-rank representation (OR-TLRR)
OR-TLRR provides outlier detection and tensor data clustering simultaneously based on the t-SVD framework.
- Score: 4.123899820318987
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, tensor low-rank representation (TLRR) has become a popular tool for tensor data recovery and clustering, due to its empirical success and theoretical guarantees. However, existing TLRR methods consider Gaussian or gross sparse noise, inevitably leading to performance degradation when the tensor data are contaminated by outliers or sample-specific corruptions. This paper develops an outlier-robust tensor low-rank representation (OR-TLRR) method that provides outlier detection and tensor data clustering simultaneously based on the t-SVD framework. For tensor observations with arbitrary outlier corruptions, OR-TLRR has provable performance guarantee for exactly recovering the row space of clean data and detecting outliers under mild conditions. Moreover, an extension of OR-TLRR is proposed to handle the case when parts of the data are missing. Finally, extensive experimental results on synthetic and real data demonstrate the effectiveness of the proposed algorithms. We release our code at https://github.com/twugithub/2024-AISTATS-ORTLRR.
Related papers
- On the Size and Approximation Error of Distilled Sets [57.61696480305911]
We take a theoretical view on kernel ridge regression based methods of dataset distillation such as Kernel Inducing Points.
We prove that a small set of instances exists in the original input space such that its solution in the RFF space coincides with the solution of the original data.
A KRR solution can be generated using this distilled set of instances which gives an approximation towards the KRR solution optimized on the full input data.
arXiv Detail & Related papers (2023-05-23T14:37:43Z) - Scalable and Robust Tensor Ring Decomposition for Large-scale Data [12.02023514105999]
We propose a scalable and robust TR decomposition algorithm capable of handling large-scale tensor data with missing entries and gross corruptions.
We first develop a novel auto-weighted steepest descent method that can adaptively fill the missing entries and identify the outliers during the decomposition process.
arXiv Detail & Related papers (2023-05-15T22:08:47Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Knockoffs-SPR: Clean Sample Selection in Learning with Noisy Labels [56.81761908354718]
We propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels.
Knockoffs-SPR can be regarded as a sample selection module for a standard supervised training pipeline.
We further combine it with a semi-supervised algorithm to exploit the support of noisy data as unlabeled data.
arXiv Detail & Related papers (2023-01-02T07:13:28Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - Fast and Provable Tensor Robust Principal Component Analysis via Scaled
Gradient Descent [30.299284742925852]
This paper tackles tensor robust principal component analysis (RPCA)
It aims to recover a low-rank tensor from its observations contaminated by sparse corruptions.
We show that the proposed algorithm achieves better and more scalable performance than state-of-the-art matrix and tensor RPCA algorithms.
arXiv Detail & Related papers (2022-06-18T04:01:32Z) - Truncated tensor Schatten p-norm based approach for spatiotemporal
traffic data imputation with complicated missing patterns [77.34726150561087]
We introduce four complicated missing patterns, including missing and three fiber-like missing cases according to the mode-drivenn fibers.
Despite nonity of the objective function in our model, we derive the optimal solutions by integrating alternating data-mputation method of multipliers.
arXiv Detail & Related papers (2022-05-19T08:37:56Z) - Noisy Tensor Completion via Low-rank Tensor Ring [41.86521269183527]
tensor completion is a fundamental tool for incomplete data analysis, where the goal is to predict missing entries from partial observations.
Existing methods often make the explicit or implicit assumption that the observed entries are noise-free to provide a theoretical guarantee of exact recovery of missing entries.
This paper proposes a novel noisy tensor completion model, which complements the incompetence of existing works in handling the degeneration of high-order and noisy observations.
arXiv Detail & Related papers (2022-03-14T14:09:43Z) - Bayesian Robust Tensor Ring Model for Incomplete Multiway Data [7.765112574724006]
Low-rank tensor completion aims to recover missing entries from the observed data.
In this paper, we propose a robust tensor ring (BRTR) decomposition method for RTC problem.
Experiments indicate that BRTR has better recovery performance and ability to remove noise than other state-of-the-art methods.
arXiv Detail & Related papers (2022-02-27T09:25:24Z) - Distributionally Robust Multi-Output Regression Ranking [3.9318191265352196]
We introduce a new listwise listwise learning-to-rank model called Distributionally Robust Multi-output Regression Ranking (DRMRR)
DRMRR uses a Distributionally Robust Optimization framework to minimize a multi-output loss function under the most adverse distributions in the neighborhood of the empirical data distribution.
Our experiments were conducted on two real-world applications, medical document retrieval, and drug response prediction.
arXiv Detail & Related papers (2021-09-27T05:19:27Z) - OpenMatch: Open-set Consistency Regularization for Semi-supervised
Learning with Outliers [71.08167292329028]
We propose a novel Open-set Semi-Supervised Learning (OSSL) approach called OpenMatch.
OpenMatch unifies FixMatch with novelty detection based on one-vs-all (OVA) classifiers.
It achieves state-of-the-art performance on three datasets, and even outperforms a fully supervised model in detecting outliers unseen in unlabeled data on CIFAR10.
arXiv Detail & Related papers (2021-05-28T23:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.