Related papers: ORSA: Outlier Robust Stacked Aggregation for Best- and Worst-Case Approximations of Ensemble Systems\

ORSA: Outlier Robust Stacked Aggregation for Best- and Worst-Case Approximations of Ensemble Systems\

URL: http://arxiv.org/abs/2111.09043v1
Date: Wed, 17 Nov 2021 11:33:46 GMT
Title: ORSA: Outlier Robust Stacked Aggregation for Best- and Worst-Case Approximations of Ensemble Systems\
Authors: Peter Domanski, Dirk Pfl\"uger, Jochen Rivoir, Rapha\"el Latty
Abstract summary: In Post-Silicon Validation for semiconductor devices (PSV), the task is to approximate the underlying function of the data with multiple learning algorithms. In PSV, the expectation is that an unknown number of subsets describe functions showing very different characteristics. Our method aims to find a suitable approximation that is robust to outliers and represents the best or worst case in a way that will apply to as many types as possible.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent years, the usage of ensemble learning in applications has grown significantly due to increasing computational power allowing the training of large ensembles in reasonable time frames. Many applications, e.g., malware detection, face recognition, or financial decision-making, use a finite set of learning algorithms and do aggregate them in a way that a better predictive performance is obtained than any other of the individual learning algorithms. In the field of Post-Silicon Validation for semiconductor devices (PSV), data sets are typically provided that consist of various devices like, e.g., chips of different manufacturing lines. In PSV, the task is to approximate the underlying function of the data with multiple learning algorithms, each trained on a device-specific subset, instead of improving the performance of arbitrary classifiers on the entire data set. Furthermore, the expectation is that an unknown number of subsets describe functions showing very different characteristics. Corresponding ensemble members, which are called outliers, can heavily influence the approximation. Our method aims to find a suitable approximation that is robust to outliers and represents the best or worst case in a way that will apply to as many types as possible. A 'soft-max' or 'soft-min' function is used in place of a maximum or minimum operator. A Neural Network (NN) is trained to learn this 'soft-function' in a two-stage process. First, we select a subset of ensemble members that is representative of the best or worst case. Second, we combine these members and define a weighting that uses the properties of the Local Outlier Factor (LOF) to increase the influence of non-outliers and to decrease outliers. The weighting ensures robustness to outliers and makes sure that approximations are suitable for most types.

Related papers

Scaling LLM Inference with Optimized Sample Compute Allocation [56.524278187351925]
We propose OSCA, an algorithm to find an optimal mix of different inference configurations. Our experiments show that with our learned mixed allocation, we can achieve accuracy better than the best single configuration. OSCA is also shown to be effective in agentic beyond single-turn tasks, achieving a better accuracy on SWE-Bench with 3x less compute than the default configuration.
arXiv Detail & Related papers (2024-10-29T19:17:55Z)
Achieving More with Less: A Tensor-Optimization-Powered Ensemble Method [53.170053108447455]
Ensemble learning is a method that leverages weak learners to produce a strong learner. We design a smooth and convex objective function that leverages the concept of margin, making the strong learner more discriminative. We then compare our algorithm with random forests of ten times the size and other classical methods across numerous datasets.
arXiv Detail & Related papers (2024-08-06T03:42:38Z)
Optimally Improving Cooperative Learning in a Social Setting [4.200480236342444]
We consider a cooperative learning scenario where a collection of networked agents with individually owned classifiers dynamically update their predictions. We show a time algorithm for optimizing the aggregate objective function, and show that optimizing the egalitarian objective function is NP-hard. The performance of all of our algorithms are guaranteed by mathematical analysis and backed by experiments on synthetic and real data.
arXiv Detail & Related papers (2024-05-31T14:07:33Z)
Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions. We propose deep negative correlation classification (DNCC) DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z)
Efficient and Differentiable Conformal Prediction with General Function Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters. We show that it achieves approximate valid population coverage and near-optimal efficiency within class. Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z)
Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one. Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP. We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z)
Learning Aggregation Functions [78.47770735205134]
We introduce LAF (Learning Aggregation Functions), a learnable aggregator for sets of arbitrary cardinality. We report experiments on semi-synthetic and real data showing that LAF outperforms state-of-the-art sum- (max-) decomposition architectures.
arXiv Detail & Related papers (2020-12-15T18:28:53Z)
Feature Importance Ranking for Deep Learning [7.287652818214449]
We propose a novel dual-net architecture consisting of operator and selector for discovery of an optimal feature subset of a fixed size. During learning, the operator is trained for a supervised learning task via optimal feature subset candidates generated by the selector. In deployment, the selector generates an optimal feature subset and ranks feature importance, while the operator makes predictions based on the optimal subset for test data.
arXiv Detail & Related papers (2020-10-18T12:20:27Z)
Learning by Minimizing the Sum of Ranked Range [58.24935359348289]
We introduce the sum of ranked range (SoRR) as a general approach to form learning objectives. A ranked range is a consecutive sequence of sorted values of a set of real numbers. We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary classification and the TKML individual loss for multi-label/multi-class classification.
arXiv Detail & Related papers (2020-10-05T01:58:32Z)
Berrut Approximated Coded Computing: Straggler Resistance Beyond Polynomial Computing [34.69732430310801]
We propose Berrut Approximated Coded Computing (BACC) as an alternative approach to deal with stragglers effect. BACC is proven to be numerically stable with low computational complexity. In particular, BACC is used to train a deep neural network on a cluster of servers.
arXiv Detail & Related papers (2020-09-17T14:23:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.