Credit card fraud detection based on machine learning

Các tác giả

  • Tran Thanh Cong Hong Bang International University

Từ khóa:

SMOTE, machine learning, classification, imbalanced data, fraud detection

Tóm tắt

Online transactions have increased dramatically over the decades. Credit card transactions account for a large proportion of these transactions. This leads to an increase in credit card fraud transactions, causing damage to the financial industry. Therefore, it is important to create fraud detection systems, consisting of two labels fraud and no fraud. However, the dataset is not balanced between the two labels. In this paper, we use the resampling method such as SMOTE to process this unbalanced dataset to obtain a balanced dataset. The machine learning (ML) algorithms, named random forest, k nearest neighbors, decision tree, and logistic regression are applied to this balanced dataset to create ML models. The performance of these ML models is evaluated through accuracy, recall, precision, and F1 score. We observed that the SMOTE-based random forest algorithm identifies frauds in a better way than other algorithms.

Abstract

Online transactions have increased dramatically over the decades. Credit card transactions account for a large proportion of these transactions. This leads to an increase in credit card fraud transactions, causing damage to the financial industry. Therefore, it is important to create fraud detection systems, consisting of two labels fraud and no fraud. However, the dataset is not balanced between the two labels. In this paper, we use the resampling method such as SMOTE to process this unbalanced dataset to obtain a balanced dataset. The machine learning (ML) algorithms, named random forest, k nearest neighbors, decision tree, and logistic regression are applied to this balanced dataset to create ML models. The performance of these ML models is evaluated through accuracy, recall, precision, and F1 score. We observed that the SMOTE-based random forest algorithm identifies frauds in a better way than other algorithms.

Tài liệu tham khảo

[1] D. S. Sisodia, N. K. Reddy and S. Bhandari, "Performance evaluation of class balancing techniques for credit card fraud detection," in 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engi-neering (ICPCSI), Chennai, 2017.

[2] B. Zhua, B. Baesens and S. K. Broucke, "An empirical comparison of techniques for the class imbalance problem in churn prediction," Information Sciences, vol. 408, pp. 84-99 , 2017.

[3] T. M. Padmaja, N. Dhulipalla, R. S. Bapi and P. Krishna, "Unbalanced Data Classification Using extreme outlier Elimination and Sampling Techniques for Fraud Detection," 15th International Conference on Advanced Com-puting and Communications (ADCOM), pp. 511-516, 2007.

[4] P. Kumari and S. P. Mishra, "Analysis of Credit Card Fraud Detection Using Fusion Classifiers," Advances in Intelligent Systems and Computing, vol. 711, pp. 111-122, 2018.

[5] R. Brause, T. Langsdorf and M. Hepp, "Neural data mining for credit card fraud detection," in Proceedings 11th International Conference on Tools with Artificial Intelligence, Chicago, IL, USA, 1999.

[6] A. Srivastava, A. Kundu, S. Sural and A. K. Majumdar, "Credit Card Fraud Detection Using Hidden Markov Model," IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, vol. 5, pp. 37 - 48, 2008.

[7] S. B. E. Raj and A. A. Portia, "Analysis on credit card fraud detection methods," in 2011 International Conference on Computer, Communication and Electrical Technology (ICCCET), Tamilnadu, India, 2011.

[8] "Kaggle," [Online]. Available: https://www.kaggle.com/mlg-ulb/creditcardfraud. [Accessed 2 9 2020].

[9] "Towards data science," [Online]. Available: https://towardsdatascience.com/scale-standardize-or-normalize-with-scikit-learn-6ccc7d176a02. [Accessed 5 9 2020].

[10] N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal Of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.

[11] K. Li, W. Zhang, Q. Lu and X. Fang, "An Improved SMOTE Imbalanced Data Classi-fication Method Based on Support Degree," in 2014 International Conference on Identification, Information and Knowledge in the Internet of Things, Beijing, China, 34-38.

[12] G. Douzas, F. Bacao and F. Last, "Improving imba-lanced learning through a heuristic oversampling method based on k-means and SMOTE," Information Sciences, vol. 465, p. 1–20, 2018.

[13] T. K. Ho, "Random decision forests," in ICDAR '95: Proceedings of the Third Inter-national Conference on Document Analysis and Recognition, IEEE Computer Society, Wa-shington, DC, USA, 1995.

[14] T. K. Ho, "Random decision forests," in ICDAR '95: Proceedings of the Third Inter-national Conference on Document Analysis and Recognition, IEEE Computer Society, Washington, DC, USA, 1995.

[15] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, p. 5–32, 2001.

[16] O. Beckonert, M. E. Bollard, T. M. Ebbels, H. C. Keun, H. Antti, E. Holmes, J. C. Lindon and J. K. Nicholson, "NMR-based metabonomic toxicity classification: hierarchical cluster analysis and k-nearest-neighbour app-roaches," Analytica Chimica Acta, vol. 490, no. 1-2, p. 3–15, 2003.

[17] B. Alsbergav, R. Goodacrea, J. Rowlandb and D. Kella, "Classification of pyrolysis mass spectra by fuzzy multivariate rule induction-comparison with regression, K-nearest neighbour, neural and decision-tree methods," Analytica Chimica Acta, vol. 348, no. 1-3, pp. 389-407, 1997.

[18] J. R. Quinlan, "Induction of decision trees," Machine Learning , vol. 1, p. 81–106, 1986.

[19] J. R. Quinlan, C4.5 : programs for machine learning, San Mateo, Calif. : Morgan Kaufmann Publishers, 1993.

[20] Bishop and C. M, Pattern recognition and machine learning, New York: Springer, 2006.

[21] S.Kanmania, V. P.Thambiduraia, V.Sankaranarayanan and P.Thambiduraia, "Object-oriented software fault prediction using neural networks," Information and Software Technology , vol. 49, no. 5, pp. 483-492, 2007.

Tải xuống

Số lượt xem: 72
Tải xuống: 43

Đã xuất bản

24.12.2020

Cách trích dẫn

[1]
T. T. Cong, “Credit card fraud detection based on machine learning”, HIUJS, vol 1, tr 45–52, tháng 12 2020.

Số

Chuyên mục

KINH TẾ VÀ QUẢN LÝ