Tighter Bound Estimation of Sensitivity Analysis for Incremental and
Decremental Data Modification
- URL: http://arxiv.org/abs/2003.03351v4
- Date: Wed, 13 Jan 2021 23:26:18 GMT
- Title: Tighter Bound Estimation of Sensitivity Analysis for Incremental and
Decremental Data Modification
- Authors: Kaichen Zhou, Shiji Song, Gao Huang, Wu Cheng, Quan Zhou
- Abstract summary: In large-scale classification problems, the data set always be faced with frequent updates when a part of the data is added to or removed from the original data set.
We propose an algorithm to make rational inferences about the updated linear classifier without exactly updating the classifier.
Both theoretical analysis and experiment results show that the proposed approach is superior to existing methods in terms of tightness of coefficients' bounds and computational complexity.
- Score: 39.62854914952284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In large-scale classification problems, the data set always be faced with
frequent updates when a part of the data is added to or removed from the
original data set. In this case, conventional incremental learning, which
updates an existing classifier by explicitly modeling the data modification, is
more efficient than retraining a new classifier from scratch. However,
sometimes, we are more interested in determining whether we should update the
classifier or performing some sensitivity analysis tasks. To deal with these
such tasks, we propose an algorithm to make rational inferences about the
updated linear classifier without exactly updating the classifier.
Specifically, the proposed algorithm can be used to estimate the upper and
lower bounds of the updated classifier's coefficient matrix with a low
computational complexity related to the size of the updated dataset. Both
theoretical analysis and experiment results show that the proposed approach is
superior to existing methods in terms of tightness of coefficients' bounds and
computational complexity.
Related papers
- Online Nonparametric Supervised Learning for Massive Data [0.0]
We develop a fast algorithm adapted to the real-time calculation of the nonparametric classifier in massive as well as streaming data frameworks.
The proposed methods are evaluated and compared to some commonly used machine learning algorithms for real-time fetal well-being monitoring.
arXiv Detail & Related papers (2024-05-29T20:04:23Z) - An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks.
The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions.
We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z) - Nonlinear Feature Aggregation: Two Algorithms driven by Theory [45.3190496371625]
Real-world machine learning applications are characterized by a huge number of features, leading to computational and memory issues.
We propose a dimensionality reduction algorithm (NonLinCFA) which aggregates non-linear transformations of features with a generic aggregation function.
We also test the algorithms on synthetic and real-world datasets, performing regression and classification tasks, showing competitive performances.
arXiv Detail & Related papers (2023-06-19T19:57:33Z) - Dataset Complexity Assessment Based on Cumulative Maximum Scaled Area
Under Laplacian Spectrum [38.65823547986758]
It is meaningful to predict classification performance by assessing the complexity of datasets effectively before training DCNN models.
This paper proposes a novel method called cumulative maximum scaled Area Under Laplacian Spectrum (cmsAULS)
arXiv Detail & Related papers (2022-09-29T13:02:04Z) - Autoencoder Based Iterative Modeling and Multivariate Time-Series
Subsequence Clustering Algorithm [0.0]
This paper introduces an algorithm for the detection of change-points and the identification of the corresponding subsequences in transient time-series data (MTSD)
We use a recurrent neural network (RNN) based Autoencoder (AE) which is iteratively trained on incoming data.
A model of the identified subsequence is saved and used for recognition of repeating subsequences as well as fast offline clustering.
arXiv Detail & Related papers (2022-09-09T09:59:56Z) - COSTI: a New Classifier for Sequences of Temporal Intervals [0.0]
We develop a novel method for classification operating directly on sequences of temporal intervals.
The proposed method remains at a high level of accuracy and obtains better performance while avoiding shortcomings connected to operating on transformed data.
arXiv Detail & Related papers (2022-04-28T12:55:06Z) - Estimating leverage scores via rank revealing methods and randomization [50.591267188664666]
We study algorithms for estimating the statistical leverage scores of rectangular dense or sparse matrices of arbitrary rank.
Our approach is based on combining rank revealing methods with compositions of dense and sparse randomized dimensionality reduction transforms.
arXiv Detail & Related papers (2021-05-23T19:21:55Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression.
Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice.
A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z) - Progressive Identification of True Labels for Partial-Label Learning [112.94467491335611]
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label.
Most existing methods elaborately designed as constrained optimizations that must be solved in specific manners, making their computational complexity a bottleneck for scaling up to big data.
This paper proposes a novel framework of classifier with flexibility on the model and optimization algorithm.
arXiv Detail & Related papers (2020-02-19T08:35:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.