Predicting insurance claims through a variety of data mining techniques: facing lots of missing values and moderate class-imbalanced levels

Logic Journal of the IGPL (forthcoming)
  Copy   BIBTEX

Abstract

This paper copes with a real-world classification problem related to the management of claims received in an insurance company. The way to obtain the classifier is not easy due to the high amount of missing values as well as the inherent imbalanced scenario within class labels. Once the data partition has been done, the training set is submitted to an intensive double grid search in order to obtain the most promising type of missing value imputation approach and then a step ahead is done using the best method and it starts the next round of data mining strategies which now falls into data rebalancing umbrella. Again, a grid search from an undersampling and oversampling family with different settings is done taking into account only seen data. The training data obtained after the first grid search are now submitted to the second step according the second grid search in order to get the ready training set for the further classifier training. The main objective of the work is to find the best combination of data mining techniques that suits the data set with a pipeline containing two types of data preparation methods coming from different families. As an outcome, first the problem of the presence of missing values has been addressed and then the data rebalancing techniques has been applied. The study focuses on obtaining classifiers based on Bayesian and lazy approaches as well as decision trees, evaluated on metrics such as the area under the ROC curve (AUC), Cohen’s kappa, Accuracy and the F-measure, among others. The imputation by the mean the mode is preferable to the Expectation Maximization Imputation in the scenario faced in this paper taking into account that the amount of missing values is higher than a forty percent for many features.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 92,471

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

Similar books and articles

Aksjologiczne podstawy ubezpieczeń społecznych.Robert Rogowski - 2009 - Annales. Ethics in Economic Life 12 (1):141-151.
Informational privacy, data mining, and the internet.Herman T. Tavani - 1999 - Ethics and Information Technology 1 (2):137-145.
多次元構造データからの分類知識の獲得.渡沼 智己 尾崎 知伸 - 2007 - Transactions of the Japanese Society for Artificial Intelligence 22 (2):173-182.
The Improbable Future of Employment‐Based Insurance.John D. Banja - 2000 - Hastings Center Report 30 (3):17-25.

Analytics

Added to PP
2024-05-16

Downloads
1 (#1,905,932)

6 months
1 (#1,478,830)

Historical graph of downloads

Sorry, there are not enough data points to plot this chart.
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references