Mobile Phone Usage Data for Credit Scoring
- URL: http://arxiv.org/abs/2002.12616v1
- Date: Fri, 28 Feb 2020 09:32:11 GMT
- Title: Mobile Phone Usage Data for Credit Scoring
- Authors: Henri Ots, Innar Liiv, and Diana Tur
- Abstract summary: We use different classification algorithms to split customers into paying and non-paying ones using mobile data.
We found that with a dataset that consists of mobile data based only on 2,503 customers, we can predict credit risk.
- Score: 1.7205106391379026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The aim of this study is to demostrate that mobile phone usage data can be
used to make predictions and find the best classification method for credit
scoring even if the dataset is small (2,503 customers). We use different
classification algorithms to split customers into paying and non-paying ones
using mobile data, and then compare the predicted results with actual results.
There are several related works publicly accessible in which mobile data has
been used for credit scoring, but they are all based on a large dataset. Small
companies are unable to use datasets as large as those used by these related
papers, therefore these studies are of little use for them. In this paper we
try to argue that there is value in mobile phone usage data for credit scoring
even if the dataset is small. We found that with a dataset that consists of
mobile data based only on 2,503 customers, we can predict credit risk. The best
classification method gave us the result 0.62 AUC (area under the curve).
Related papers
- Data Distribution Valuation [56.71023681599737]
Existing data valuation methods define a value for a discrete dataset.
In many use cases, users are interested in not only the value of the dataset, but that of the distribution from which the dataset was sampled.
We propose a maximum mean discrepancy (MMD)-based valuation method which enables theoretically principled and actionable policies.
arXiv Detail & Related papers (2024-10-06T07:56:53Z) - Scaling Laws for the Value of Individual Data Points in Machine Learning [55.596413470429475]
We introduce a new perspective by investigating scaling behavior for the value of individual data points.
We provide learning theory to support our scaling law, and we observe empirically that it holds across diverse model classes.
Our work represents a first step towards understanding and utilizing scaling properties for the value of individual data points.
arXiv Detail & Related papers (2024-05-30T20:10:24Z) - Combining Public Human Activity Recognition Datasets to Mitigate Labeled
Data Scarcity [1.274578243851308]
We propose a novel strategy to combine publicly available datasets with the goal of learning a generalized HAR model.
Our experimental evaluation, which includes experimenting with different state-of-the-art neural network architectures, shows that combining public datasets can significantly reduce the number of labeled samples.
arXiv Detail & Related papers (2023-06-23T18:51:22Z) - Going beyond research datasets: Novel intent discovery in the industry
setting [60.90117614762879]
This paper proposes methods to improve the intent discovery pipeline deployed in a large e-commerce platform.
We show the benefit of pre-training language models on in-domain data: both self-supervised and with weak supervision.
We also devise the best method to utilize the conversational structure (i.e., question and answer) of real-life datasets during fine-tuning for clustering tasks, which we call Conv.
arXiv Detail & Related papers (2023-05-09T14:21:29Z) - Data-OOB: Out-of-bag Estimate as a Simple and Efficient Data Value [17.340091573913316]
We propose Data-OOB, a new data valuation method for a bagging model that utilizes the out-of-bag estimate.
Data-OOB takes less than 2.25 hours on a single CPU processor when there are $106$ samples to evaluate and the input dimension is 100.
We demonstrate that the proposed method significantly outperforms existing state-of-the-art data valuation methods in identifying mislabeled data and finding a set of helpful (or harmful) data points.
arXiv Detail & Related papers (2023-04-16T08:03:58Z) - Feature-Level Fusion of Super-App and Telecommunication Alternative Data
Sources for Credit Card Fraud Detection [106.33204064461802]
We review the effectiveness of a feature-level fusion of super-app customer information, mobile phone line data, and traditional credit risk variables for the early detection of identity theft credit card fraud.
We evaluate our approach over approximately 90,000 users from a credit lender's digital platform database.
arXiv Detail & Related papers (2021-11-05T19:10:35Z) - CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps.
We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z) - dMelodies: A Music Dataset for Disentanglement Learning [70.90415511736089]
We present a new symbolic music dataset that will help researchers demonstrate the efficacy of their algorithms on diverse domains.
This will also provide a means for evaluating algorithms specifically designed for music.
The dataset is large enough (approx. 1.3 million data points) to train and test deep networks for disentanglement learning.
arXiv Detail & Related papers (2020-07-29T19:20:07Z) - The Value of Big Data for Credit Scoring: Enhancing Financial Inclusion
using Mobile Phone Data and Social Network Analytics [6.919243767837341]
This paper leverages alternative data sources to enhance both statistical and economic model performance.
A unique combination of datasets, including call-detail records, credit and debit account information of customers is used.
The results have an impact in terms of ethical use of call-detail records, regulatory implications, financial inclusion, as well as data sharing and privacy.
arXiv Detail & Related papers (2020-02-23T16:13:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.