ORSA: Outlier Robust Stacked Aggregation for Best- and Worst-Case
Approximations of Ensemble Systems\
- URL: http://arxiv.org/abs/2111.09043v1
- Date: Wed, 17 Nov 2021 11:33:46 GMT
- Title: ORSA: Outlier Robust Stacked Aggregation for Best- and Worst-Case
Approximations of Ensemble Systems\
- Authors: Peter Domanski, Dirk Pfl\"uger, Jochen Rivoir, Rapha\"el Latty
- Abstract summary: In Post-Silicon Validation for semiconductor devices (PSV), the task is to approximate the underlying function of the data with multiple learning algorithms.
In PSV, the expectation is that an unknown number of subsets describe functions showing very different characteristics.
Our method aims to find a suitable approximation that is robust to outliers and represents the best or worst case in a way that will apply to as many types as possible.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, the usage of ensemble learning in applications has grown
significantly due to increasing computational power allowing the training of
large ensembles in reasonable time frames. Many applications, e.g., malware
detection, face recognition, or financial decision-making, use a finite set of
learning algorithms and do aggregate them in a way that a better predictive
performance is obtained than any other of the individual learning algorithms.
In the field of Post-Silicon Validation for semiconductor devices (PSV), data
sets are typically provided that consist of various devices like, e.g., chips
of different manufacturing lines. In PSV, the task is to approximate the
underlying function of the data with multiple learning algorithms, each trained
on a device-specific subset, instead of improving the performance of arbitrary
classifiers on the entire data set. Furthermore, the expectation is that an
unknown number of subsets describe functions showing very different
characteristics. Corresponding ensemble members, which are called outliers, can
heavily influence the approximation. Our method aims to find a suitable
approximation that is robust to outliers and represents the best or worst case
in a way that will apply to as many types as possible. A 'soft-max' or
'soft-min' function is used in place of a maximum or minimum operator. A Neural
Network (NN) is trained to learn this 'soft-function' in a two-stage process.
First, we select a subset of ensemble members that is representative of the
best or worst case. Second, we combine these members and define a weighting
that uses the properties of the Local Outlier Factor (LOF) to increase the
influence of non-outliers and to decrease outliers. The weighting ensures
robustness to outliers and makes sure that approximations are suitable for most
types.
Related papers
- Scaling LLM Inference with Optimized Sample Compute Allocation [56.524278187351925]
We propose OSCA, an algorithm to find an optimal mix of different inference configurations.
Our experiments show that with our learned mixed allocation, we can achieve accuracy better than the best single configuration.
OSCA is also shown to be effective in agentic beyond single-turn tasks, achieving a better accuracy on SWE-Bench with 3x less compute than the default configuration.
arXiv Detail & Related papers (2024-10-29T19:17:55Z) - Achieving More with Less: A Tensor-Optimization-Powered Ensemble Method [53.170053108447455]
Ensemble learning is a method that leverages weak learners to produce a strong learner.
We design a smooth and convex objective function that leverages the concept of margin, making the strong learner more discriminative.
We then compare our algorithm with random forests of ten times the size and other classical methods across numerous datasets.
arXiv Detail & Related papers (2024-08-06T03:42:38Z) - Optimally Improving Cooperative Learning in a Social Setting [4.200480236342444]
We consider a cooperative learning scenario where a collection of networked agents with individually owned classifiers dynamically update their predictions.
We show a time algorithm for optimizing the aggregate objective function, and show that optimizing the egalitarian objective function is NP-hard.
The performance of all of our algorithms are guaranteed by mathematical analysis and backed by experiments on synthetic and real data.
arXiv Detail & Related papers (2024-05-31T14:07:33Z) - Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions.
We propose deep negative correlation classification (DNCC)
DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - Learning Aggregation Functions [78.47770735205134]
We introduce LAF (Learning Aggregation Functions), a learnable aggregator for sets of arbitrary cardinality.
We report experiments on semi-synthetic and real data showing that LAF outperforms state-of-the-art sum- (max-) decomposition architectures.
arXiv Detail & Related papers (2020-12-15T18:28:53Z) - Feature Importance Ranking for Deep Learning [7.287652818214449]
We propose a novel dual-net architecture consisting of operator and selector for discovery of an optimal feature subset of a fixed size.
During learning, the operator is trained for a supervised learning task via optimal feature subset candidates generated by the selector.
In deployment, the selector generates an optimal feature subset and ranks feature importance, while the operator makes predictions based on the optimal subset for test data.
arXiv Detail & Related papers (2020-10-18T12:20:27Z) - Learning by Minimizing the Sum of Ranked Range [58.24935359348289]
We introduce the sum of ranked range (SoRR) as a general approach to form learning objectives.
A ranked range is a consecutive sequence of sorted values of a set of real numbers.
We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary classification and the TKML individual loss for multi-label/multi-class classification.
arXiv Detail & Related papers (2020-10-05T01:58:32Z) - Berrut Approximated Coded Computing: Straggler Resistance Beyond
Polynomial Computing [34.69732430310801]
We propose Berrut Approximated Coded Computing (BACC) as an alternative approach to deal with stragglers effect.
BACC is proven to be numerically stable with low computational complexity.
In particular, BACC is used to train a deep neural network on a cluster of servers.
arXiv Detail & Related papers (2020-09-17T14:23:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.