Evaluation of Tree Based Regression over Multiple Linear Regression for
Non-normally Distributed Data in Battery Performance
- URL: http://arxiv.org/abs/2111.02513v1
- Date: Wed, 3 Nov 2021 20:28:24 GMT
- Title: Evaluation of Tree Based Regression over Multiple Linear Regression for
Non-normally Distributed Data in Battery Performance
- Authors: Shovan Chowdhury, Yuxiao Lin, Boryann Liaw, Leslie Kerby
- Abstract summary: This study explores the impact of data normality in building machine learning models.
Tree-based regression models and multiple linear regressions models are each built from a highly skewed non-normal dataset.
- Score: 0.5735035463793008
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Battery performance datasets are typically non-normal and multicollinear.
Extrapolating such datasets for model predictions needs attention to such
characteristics. This study explores the impact of data normality in building
machine learning models. In this work, tree-based regression models and
multiple linear regressions models are each built from a highly skewed
non-normal dataset with multicollinearity and compared. Several techniques are
necessary, such as data transformation, to achieve a good multiple linear
regression model with this dataset; the most useful techniques are discussed.
With these techniques, the best multiple linear regression model achieved an
R^2 = 81.23% and exhibited no multicollinearity effect for the dataset used in
this study. Tree-based models perform better on this dataset, as they are
non-parametric, capable of handling complex relationships among variables and
not affected by multicollinearity. We show that bagging, in the use of Random
Forests, reduces overfitting. Our best tree-based model achieved accuracy of
R^2 = 97.73%. This study explains why tree-based regressions promise as a
machine learning model for non-normally distributed, multicollinear data.
Related papers
- Adaptive Optimization for Prediction with Missing Data [6.800113478497425]
We show that some adaptive linear regression models are equivalent to learning an imputation rule and a downstream linear regression model simultaneously.
In settings where data is strongly not missing at random, our methods achieve a 2-10% improvement in out-of-sample accuracy.
arXiv Detail & Related papers (2024-02-02T16:35:51Z) - ZeroShape: Regression-based Zero-shot Shape Reconstruction [56.652766763775226]
We study the problem of single-image zero-shot 3D shape reconstruction.
Recent works learn zero-shot shape reconstruction through generative modeling of 3D assets.
We show that ZeroShape achieves superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2023-12-21T01:56:34Z) - An Efficient Data Analysis Method for Big Data using Multiple-Model
Linear Regression [4.085654010023149]
This paper introduces a new data analysis method for big data using a newly defined regression model named multiple model linear regression(MMLR)
The proposed data analysis method is shown to be more efficient and flexible than other regression based methods.
arXiv Detail & Related papers (2023-08-24T10:20:15Z) - Analysis of Interpolating Regression Models and the Double Descent
Phenomenon [3.883460584034765]
It is commonly assumed that models which interpolate noisy training data are poor to generalize.
The best models obtained are overparametrized and the testing error exhibits the double descent behavior as the model order increases.
We derive a result based on the behavior of the smallest singular value of the regression matrix that explains the peak location and the double descent shape of the testing error as a function of model order.
arXiv Detail & Related papers (2023-04-17T09:44:33Z) - Constructing Effective Machine Learning Models for the Sciences: A
Multidisciplinary Perspective [77.53142165205281]
We show how flexible non-linear solutions will not always improve upon manually adding transforms and interactions between variables to linear regression models.
We discuss how to recognize this before constructing a data-driven model and how such analysis can help us move to intrinsically interpretable regression models.
arXiv Detail & Related papers (2022-11-21T17:48:44Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Easy Differentially Private Linear Regression [16.325734286930764]
We study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models.
We find that this algorithm obtains strong empirical performance in the data-rich setting.
arXiv Detail & Related papers (2022-08-15T17:42:27Z) - CurFi: An automated tool to find the best regression analysis model
using curve fitting [0.0]
A curve fitting system named "CurFi" was developed that uses linear regression models to fit a curve to a dataset.
The system facilitates to upload a dataset, split the dataset into training set and test set, select relevant features and label from the dataset.
arXiv Detail & Related papers (2022-05-16T16:52:10Z) - Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression.
It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise.
This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z) - Flexible Model Aggregation for Quantile Regression [92.63075261170302]
Quantile regression is a fundamental problem in statistical learning motivated by a need to quantify uncertainty in predictions.
We investigate methods for aggregating any number of conditional quantile models.
All of the models we consider in this paper can be fit using modern deep learning toolkits.
arXiv Detail & Related papers (2021-02-26T23:21:16Z) - A Hypergradient Approach to Robust Regression without Correspondence [85.49775273716503]
We consider a variant of regression problem, where the correspondence between input and output data is not available.
Most existing methods are only applicable when the sample size is small.
We propose a new computational framework -- ROBOT -- for the shuffled regression problem.
arXiv Detail & Related papers (2020-11-30T21:47:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.